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Transfer Between Instrument and Contact Flight Training * 


Malcolm L. Ritchie and Archer L. Michael 


University of Illinois 


To meet the demands of an all-weather Air 
Force, pilots must be trained to operate air- 
craft proficiently by the use of two radically 
different sets of visual displays. Operation 
with visual reference to the outside world is 
called contact flight. Operation with sole ref- 
erence to instruments within the aircraft is 
called instrument flight. Pilots have tradi- 
tionally been trained to proficiency on con- 
tact flight before beginning instrument train- 
ing. There is, however, no a priori necessity 
for this order of events in flight training. It 
can be understood from a historical point of 
view, but no evidence has been collected to 
support any particular arrangement of con- 
tact and instrument flight in a flight-training 
syllabus. It is tacitly assumed that transfer 
of training will occur both ways, but since 
contact flying is presumed to be “easier” than 
instrument flying it is put first in all curricula. 

The fact that it is possible to generate a 
particular sequence of aircraft control move- 
ments, and hence a particular aircraft behav- 
ior, in response to either of two different sets 
of visual inputs makes a convenient situation 
for studying the relative merits of such in- 
puts. The amount of practice required to 
achieve some specified level of proficiency can 
be used as a measure of the excellence of dis- 
plays.. Clearly, if more practice is required 
to fly a set of maneuvers by instruments than 
by contact, there is room for improvement in 
the instrument displays. 

This study was designed to investigate four 
questions about instrument and contact fly- 
ing as follows: 

1This research was supported in part by the 
United States Air Force under Contract AF 33(038)- 
25726, monitored by the Air Force Personnel and 
Training Research Center. Permission is granted for 
reproduction, translation, publication, use, and dis- 


posal in whole and in part by or for the United 
States Government. 


1. Which is easier to learn, contact or in- 
strument flight? 

2. Does learning contact flying make it 
easier to learn instrument flying? 

3. Does learning instrument flying make it 
easier to learn contact flying? 

4. If both contact and instrument flying 
are to be learned, which should be taught first 
for the least amount of total learning time? 


Method 


Experimental design. Twenty-two subjects (Ss) 
were divided by random assignment into two groups, 
each of which learned both a contact and an instru- 
ment flying task. The procedure for the groups dif- 
fered only in the order in which they learned the 
tasks. The plan was as follows: 


Task 1 
Learned instruments 
Learned contact 


Task 2 


Learned contact 
Learned instruments 


Group I 
Group II 


This amounts to two basic transfer designs with each 
group acting as a control for the other. 

Subjects and experimenters. The Ss were 22 mem- 
bers of the Air Force and Naval ROTC at the Uni- 
versity of Illinois. All were volunteers. Motivation 
was increased by the promise of a short flight around 
the local area as a reward for faithful participation. 

The two authors were experimenters (Es). One 
flew seven Ss in Group I and eight in Group II. 
The other flew four Ss in Group I and three in 
Group II. A detailed syllabus and several flights to- 
gether were used to enhance reliability between Es. 

The tasks. For each condition S was required to 
learn to fly the airplane straight and level and to 
make level 180° turns. In the contact condition ac- 
cess to the flight instruments was prevented, and S 
had to rely on visual reference to the world outside 
the airplane. In the instrument condition visual ac- 
cess to the world outside was prevented, and S had 
to rely on visual reference to the flight instruments. 
After preliminary instruction S was required to learn 
to fly the airplane straight and level and to make 
level 180° turns. When these two maneuvers had 
been learned to a specified level of proficiency on 
one condition, exactly the same maneuvers were 
learned to the same criterion on the other condition. 
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Apparatus. A Piper “Pacer” airplane, number 
N7733K, was used for all the flights in the experi- 
ment. The airplane is a four-place, 125-horsepower 
stock model with wheel control. The only modifi- 
cation consisted in the addition of the instruments 
noted below and in provisions for controlling S’s 
visual field (restricting vision to the inside of the 
plane or preventing visual access to the instruments). 
The plane was trimmed to fly “hands off” at 100 
miles per hour and all maneuvers were begun 
straight and level at this speed. 

Control over the visual displays available to S$ 
was provided by amber windshield and window cov- 
ers, a blue-green instrument cover, blue blind-flying 
goggles, and red dark-adaptation goggles. When the 
amber windshield and window covers were installed 
and S wore blue goggles, vision was restricted to the 
inside of the plane. When the blue-green instrument 
cover was in place and S wore red dark-adaptation 
goggles, he could see everything normally available 
except the flight instruments. The E had unre- 
stricted visibility inside and outside at all times. 

The aircraft was equipped with the instruments 
necessary for “attitude” instrument flying, that is, 
control of the plane’s performance primarily through 
controlling nose and wing position relative to the 
earth. The instruments used were: artificial horizon, 
type AM-5376-1; directional gyro, type AN-5739-1; 
sensitive altimeter, type B-3; turn and slip indicator, 
type A-5; vertical speed indicator, type A-6; and 
airspeed indicator, type 544. 

Procedure. The sequence of the training exercises 
used is shown in Table 1. 

On every flight the windshield or instrument covers 
and S’s goggles were in place before take-off and re- 
mained so until after landing. This was done so that 
S saw either one display or the other, but never the 
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two together. Group I Ss, for example, did not see 
outside the airplane until they completed all the in- 
strument exercises and began contact work. Upon 
starting contact they did not see the instruments in 
flight until the experiment was completed. 

In the flight procedure the instructor would take 
off, climb to smooth air and good horizon, level off 
and trim the airplane to fly “hands off” at 100 miles 
per hour. If smooth air or a required horizon could 
not be located, the plane was returned to the line 
without S’s having operated the controls. Most trials 
were made between 5,000 feet and 8,000 feet. Some 
were made as low as 3,000 feet and some as high as 
12,000 feet, as weather conditions required. When 
the plane was trimmed, the instructor would explain 
and demonstrate the maneuver to be executed. Any 
questions were answered with a view to maximum 
instruction. The S’s trials were timed with a stop 
watch. During a performance trial E remained si- 
lent, unless an error had already occurred rendering 
the trial invalid. In the latter case the instructor 
might point out errors or give further instruction, 
but would not interfere with the time on controls. 
After each trial E flew the airplane to set up for the 
next trial. Between trials he performed such in- 
struction as seemed called for, answering any ques- 
tions. The instructor made all descents and landings. 

Flights were scheduled to last about one hour each. 
No S completed the syllabus in fewer than three 
flights, and none took more than six flights (includ- 
ing aborted missions in which for reasons of weather 
S did not operate the controls). 

Criteria. The last two of the exercises in each 
task—straight and level, and level turns—were used 
as performance measures. The accepted perform- 
ance criterion for straight and level was three con- 
secutive two-minute trials within the following 


Table 1 


Group I 


Task I, Instruments 


Pretraining 








, Contact 


1. Preliminary ground instruction 
2. Preliminary instrument instruction 
(ground) 
. Demonstration of instrument flight 
references 
. Effect of controls 
5. Return to straight and level 


. Demonstration of contact flight 
references 
. Return to straight and level 


Performance 
. Straight and level 
. Level turns 


Straight and level 
. Level turns 


Group II 


Task I, Contact 


Task II, Instruments 





Pretraining 
. Preliminary ground instruction 
. Demonstration of contact flight 
references 
. Effect of controls 
. Return to straight and level 


. Preliminary instrument instruction 
(ground) 

. Demonstration of instrument flight 
references 


. Return to straight and level 
Performance 


. Straight and level 
. Level turns 


. Straight and level 
. Level turns 








Transfer Between Instrument and Contact Flight Training 


Table 2 


Raw Score Results: Number of Trials Required to Reach the Criterion 





Task I 
Instrument Condition 
Straight 

& Level Turns 


Group I 


* 


Total 

3 13 

1( ; 18 
2 23 

15 16 
1 


S 


Ne 


) 

2 3 
21 
31 
24 
11 
25 
14 

199 


um w 


2 
19 


~I OO 
PrP Prrrr yl > 


oo 
ice) 


9 
10 
11 


Total 77 





Task I 
Contact Condition 
Group II - - - 
— ---- Straight 


S E* 


12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 





‘ 
0 
4 

1 
16 
4 
43 
16 
0 

? 


Bem rr re eee SP 


Total 





* Experimenter with whom S flew. 


limits: airspeed—plus or minus 5 mph; direction— 
plus or minus 5°; coordination—plus or minus }2 
ball. The criterion for Jevel turns was the same as 
that for straight and level with the addition of the 
following limits: bank—plus or minus 5°. 

The S was allowed to miss the new direction on 
roll-out from the level turn by as much as 20°, pro- 
vided that no reversal of turn greater than 5° oc- 
curred in the of correcting back to the 
proper course. 


process 


Results 


The results are shown in Table 2. The en- 
tries are numbers of two-minute trials re- 
quired to reach the learning criterion. The 
three criterion trials were not included in the 
scores. 

Relative difficulty of the tasks. Table 2 
shows that it required 199 trials for all Ss in 
Group I to learn instruments as a first task. 
It took 111 trials for Group II Ss to learn 


Task II 


Contact Condition 


Grand 
Total 


Straight 
& Level 
5 4 
0 
0 18 18 
0 1 1 
3 3 
0) 0 


Turns Total 


9 22 
18 
41 
17 
6 
21 
45 
33 
12 
29 
14 


258 


Task II 


Instrument Condition 


Grand 
Total 


Straight 

& Level Turns _ Total 
18 
31 
12 
20 
15 
26 
38 
33 
15 
16 
18 


36 
38 
12 
24 


42 
42 


31 


contact as a first task. Given the assumption 
that the groups were comparable, it may be 
stated that the contact task was learned in 
44% fewer trials. With the small number of 
Ss the difference between instruments and 
contact as first tasks was not statistically sig- 
nificant. When both tasks for both groups 
were considered, the differences between con- 
tact and instruments were highly significant 
(variance due to condition in Table 3). 
Examination of the total trials taken on 
each maneuver shows that straight and level 
was learned in fewer trials than turns in the 
contact condition. Turns took fewer trials 
than straight and level on instruments. Sign 
test comparisons 3 and 4 of Table 4 show sig- 
nificant differences on straight and level for 
the two conditions. Sign test comparisons 5 
and 6 show no real differences on turns. 
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Table 3 


Summary of Analysis of Variance 








Source of 
Variation 


Mean 
Square F 


Sum of 


Squares df 
Independent Observations 
52,302.03 1 52,302.03 








Sequence 1.50 
Residual between 


individuals 696,142.91 20 34,807.15 


Total between 
individuals 


748,444.94 21 





Correlated Observations 
681,762.03 1 681,762.03 51.45** 


1,191.84 1 1,191.84 
61,824.75 1 61,824.75 4.67* 


Condition 

Trial 

Condition X Trial 

Residual within 
individuals 





251,754.88 19 13,250.26 


Total within 


individuals 996,533.50 22 








Note.—To satisfy the assumptions of analysis of variance 
the raw scores were transformed to a logarithmic scale. The 
formula for the transform was: Log score = [log (raw score 
+411) — 1.0) X 1000. 

* Significant at the .05 level. 

** Significant at the .01 level. 


Thus a real difference between the two con- 
ditions was indicated in the straight and level 
maneuver. 

Transfer of training. Transfer of training 
was measured by the use of the following 
formulas: 


Transfer from instruments to contact 


_ Gp. II task 1 score— Gp. I task 2 score 100 
a Gp. II task 1 score 





Transfer from contact to instruments 


Gp. I task 1 score— Gp. II task 2 score 
= : "4 x 100 
Gp. I task 1 score 





Group I learned instruments first and is the 
control group for instrument performance. 
Group II learned contact first and is the con- 
tact control. The formula expresses the dif- 
ference between control and transfer group 
performance on a given condition as a per- 
centage of the control group score. When 
the formulas were applied, the following 
transfer results were obtained: -; 


Table 4 


Nonparametric Comparisons 








First 
Mann-Whitney U Test 
1. Total trials instrument Group I 
vs. contact Group IT 
. Group I inst. trials vs. Group II 
inst. trials 
. Group I cont. trials vs. Group IT 
cont. trials 
. Group I st. & level (inst.) vs. 
Group IT st. & level (inst.) 
. Group I st. & level (cont.) vs. 
Group II st. & level (cont.) 
. Group I turns (inst.) vs. Group 
II turns (inst.) 
. Group I turns (cont.) vs. Group 
II turns (cont.) 2 


First 
Sign Test 


Median Median 


Median Median r 





Second 





31 
18 

4 
10 125.0 
0 130.0 
6 126.5 60.5 


4 111.0 76.0 


Second 


> 


. Group L inst. trials vs. cont. trials 18 


18 
10 


10 


. Group IT inst. trials vs. cont. trials 
. Group Ist. & level inst. vs. cont. 


. Group IT st. & level inst. vs. cont. 


. Group I turns inst. vs. cont. 8 
. Group II turns inst. vs. cont. 6 


oo 


wb OoOoOnWw 
ane ow oe 





Transfer Between Instrument and Contact Flight Training 


Transfer from contact to instruments 
= — 22% 


' Transfer from instruments to contact = 47% 


Under the assumption that the groups were 
equivalent, these results show that flight by 
instruments was made harder to learn when 
contact flight had been learned first. Learn- 
ing instruments first reduced the time re- 
quired to learn the contact task by almost 
one-half. The total difference in transfer ef- 
fects is significant at the .05 level (shown in 
Table 3 as Condition < Trial interaction). 
The relative efficiency of the orders. When 
the instrument task was learned first (Group 
I), the total trials required to learn both tasks 
were 258 (Table 2). When the contact task 
was learned first (Group II), 353 trials were 
required to learn both tasks. This is a dif- 
ference of 25% in favor of learning instru- 
ments first. Although this 25% savings is the 
best estimate of the effect of order, the differ- 
ence was not statistically significant (¢ test). 


Discussion 


The present experiment shows (a) that in- 
strument flying was harder to learn than con- 
tact, and (5) that contact and instruments 
had different transfer effects upon each other. 
The estimates of the transfer effects for the 
conditions of the experiment are — 22% for 
contact to instruments and 47% for instru- 
ments to contact. This difference in the di- 
rection of transfer would be expected to re- 
duce consistently the total learning time for 
both tasks when the instrument task is learned 
first, although this finding was not conclu- 
sively established. 

Some of the instruments used in this ex- 
periment have been evaluated individually 
(see Fitts, 3) and have been found to con- 
flict with established perceptual habits (popu- 
lation stereotypes). In order to learn to use 
such instruments the pilot must not only learn 
new display relationships but must also over- 
come the old habits. The present experiment 
confirms the indications of these individual 
instrument evaluation experiments by show- 
ing the instrument task to have been more 
difficult to learn than contact. That the two 
displays are incompatible was indicated by 
the poor transfer showing from contact to in- 
struments. 
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To the extent that the transfer results may 
be relied upon, it may be seen that the pres- 
ent procedure of having contact well learned 
before beginning instruments is at least waste- 
ful of learning efforts. At the extreme it is 
possible that a whole generation of instru- 
ment-hating pilots was created when they met 
instruments which appeared unnatural as they 
opposed well-established contact habits. The 
obvious solution is the development of instru- 
ments which are compatible with population 
stereotypes and with contact habits. (For 
purposes of transition they need also be com- 
patible with current instruments.) In the 
meantime, pilots must be trained to use the 
present instruments. The effects of incom- 
patibility have been shown to be reduced 
when instruments were learned first. Per- 
haps some other training procedure, such as 
interweaving very early contact training with 
instrument training, might produce results 
just as desirable. 


Summary 


Twenty-two Ss were taught two flight ma- 
neuvers on contact and on instruments. One 
group learned contact first, the other instru- 
ments first. Contact training took signifi- 
cantly fewer trials to learn. Transfer from 
contact to instruments was — 22%. Trans- 
fer from instruments to contact was 47%. 
The variance due to transfer effects was sig- 
nificant at the .05 level. The results confirm 
indications from previous studies that these 
instruments involve learning habits which are 
incompatible with population stereotypes. Im- 
plications for training and for instrument de- 
sign are discussed. 


Received June 10, 1954. 
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Predicting Code Proficiency of Radiotelegraphers by 
Means of Aural Tests 


Edwin A. Fleishman * 


Air Force Personnel and Training Research Center * 
Lackland Air Force Base, Texas 


Men vary greatly in their ability to receive 
telegraphic code. Moreover, large individual 
differences in code reception continue to be 
demonstrated even after extensive training. 
There is a persistent problem of selecting men 
who can be trained to become efficient radio- 
telegraphers in the shortest possible time. 

There have been a number of studies which 
have attempted to predict code proficiency of 
radio operators by means of aptitude tests in 
military and other situations. These studies 
have been reviewed elsewhere by Taylor (6) 
and more recently by Creager (1). In gen- 
eral, these separate studies have indicated 
that predictions based solely on printed tests 
were low to moderate, but that considerable 
improvement results when certain aural code 
tests are used. The present paper describes 
an attempt to evaluate a number of such 


auditory-perceptual tests in a single study. 
More specifically, the study was directed to- 
ward predicting proficiency in telegraphic code 
reception in an Air Force training situation. 


Descriptions of the Tests 


Six of the experimental tests were adapta- 
tions of the Seashore Measures of Musical 
Talents (Series A) (5). Actually, these tests 
reflect measures of certain basic auditory- 
perceptual abilities that might be expected to 
have generality to other areas which involve 
discrimination of auditory stimuli. More- 
over, in some previous work with small sam- 
ples (e.g., 7), certain of the various subtests 
have shown some correlation with speed of 
telegraphic code reception. 


1 The valuable assistance of Julius G. Spratte dur- 
ing various phases of this study is gratefully ac- 
knowledged. 

2 This research was carried out under the Air Force 
Personnel and Training Research Center, Lackland 
Air Force Base, San Antonio, Texas, in support of 
Project 7700. Permission is granted for reproduc- 
tion, translation, publication, use, and disposal in 
whole or in part by or for the United States Gov- 
ernment. 


As traditionally given, the Seashore tests 
comprise three 12-inch, 78 RPM records (six 
sides). Directions are given orally and dem- 
onstrations are provided by the administrator 
before each subtest by playing examples from 
several portions of the record. The tests are 
usually given on an individual basis. In the 
present study the procedure was more stand- 
ardized and adapted for group administra- 
tion. The tests were recorded on high fidelity 
tape complete with standard instructions and 
demonstrations. A persistent problem with 
these tests is that they require careful atten- 
tion over prolonged periods of time in the 
presence of very monotonous stimuli. Special 
efforts were made in the instructions to keep 
the Ss’ attention and motivation at a high 
level. Complete dialogue recorded for each 
test may be found elsewhere (2). A brief 
description of each of the six tests follows: 

Pitch Discrimination: A series of 50 pairs 
of tones differing in pitch. The examinee 
prints H if the second tone is higher, and L 
if the second tone is lower in each pair. 

Loudness Discrimination: A series of 50 
pairs of tones differing in loudness (intensity 
or strength). The examinee prints S if the 
second tone is stronger, and W if the second 
tone is weaker in each pair. 

Rhythm Discrimination: A series of 30 
pairs of rhythmic patterns (beats within each 
pair presented in rapid succession). The ex- 
aminee prints S if the patterns in each pair 
are the same, and D if they are different. 

Time Discrimination: A series of 50 pairs 
of tones which differ in length. The examinee 
prints L if the second tone is longer, and S if 
the second tone is shorter in each pair. 

Timbre Discrimination: A series of 50 pairs 
of tones differing in timbre, or tone quality. 
The examinee prints S if the tones in each 
pair are the same, and D if they are different. 

Tonal Memory: A series of 30 pairs of tone 
patterns. In the second pattern of each pair 
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one note is changed. The examinee listens to 
the first pattern in each pair. and then marks 
the number of the note that is changed in the 
second pattern (e.g., a / if the first note is 
changed, a 2 if the second note is changed, 
etc.). 

In each test, difficulty increases progres- 
sively as the test continues. For example, in 
the Pitch Discrimination Test, the frequency 
differences to be discriminated in each pair 
decreases; in the Time Discrimination Test 
the lengths of the notes in each pair become 
more identical; in the Tonal Memory Test 
the number of tones to be remembered in 
each pattern increases. Complete descrip- 
tions of the step intervals in difficulty em- 
ployed for each test and the scoring keys may 
be found elsewhere (5). Subjects recorded 
their answers on special blanks provided for 
each test. Short pauses after each series of 
ten items helped the examinees keep their 
place on the answer blank. The examinees 
were instructed to guess if necessary, but to 
record an answer for each pair of stimuli. 
Score on each test was the total number 
correct. 

Two other new experimental tests were de- 
veloped specifically for the study: 

Dot Perception Test (DPT): A series of 
fifty-five-signal groups, consisting of rapid 
patterns of “dots” and “dashes.” For each 
group the examinee simply marks (on a stand- 
ard IBM answer sheet) the number of “dots” 
presented (1, 2, 3, 4, or 5) in each pattern. 

Code Distraction Test (CDT): A series of 
150 signal groups of the kind presented in the 
Dot Perception Test. However, this time the 
signal groups are presented in the presence of 
additional and irrelevant background auditory 
signals. The examinee must respond (on an 
IBM answer sheet) according to the number 
of “dots” in the relevant code groups, trying 
to ignore or not be distracted by the back- 
ground “noise” in which they were imbedded. 

The rationale for developing these two tests 
was derived from an analysis of possible 
sources of difficulty in code reception. One 
difficulty frequently mentioned is the prob- 
lem of differentiating between “dots” and 
“dashes” when these are presented sequen- 
tially in rapid order within a pattern of 
sounds. It seemed possible that the ability 
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to make this discrimination might be some- 
what different from the ability to discrimi- 
nate between two tones differing in duration 
(as in the Time Discrimination Test de- 
scribed above). To give an extreme illustra- 
tion, the transmission of three “dots” in rapid 
succession may sound the same as the trans- 
mission of a continuous signal of the same 
duration. A more realistic situation, how- 
ever, would probably be more like this: three 
dots and a dash are transmitted, and the 
third dot is not perceived as a separate sig- 
nal; that is, the signal is perceived as two 
dots and a dash, or as a dot and two dashes. 
The Dot Perception Test was constructed to 
examine the predictive value of a measure of 
this kind of function.* 

Another possible source of difficulty in the 
radio operator’s job is that he usually must 
receive code under far from ideal conditions. 
The relevant code signals must be received in 
the presence of “noise” and other irrelevant 
signals. In the training course itself much 
practice is given in receiving code under these 
conditions. The Code Distraction Test was 
developed to see if individual differences in 
the ability to ignore such distraction before 
training would be predictive of later radio 
operator success. 

The stimulus signals for these two tests 
were recorded by a trained radio operator 
using a standard telegraph key. The key 
was used in conjunction with a small oscil- 
lator. The signal was then picked up through 
a communications receiver and fed into the 
tape recorder. This procedure eliminated key 
clicks, allowed adequate monitoring, and pro- 
vided realistic signals. 

In the Dot Perception Test the examinee 
was given five demonstration examples and 
15 practice examples. The test portion con- 
tained 50 items in which the number of “dots” 


Dr. R. W. Highland (AFPTRC, Keesler Air 
Force Base) recently has studied relationships be- 
tween certain objective features of Morse Code char- 
acters and receiving difficulty. In an unpublished 
study, he found that the variable most highly cor- 
related with receiving difficulty was “number of dots 
per character.” 

Actually, the standard amount of time occupied 
by a dot (called a baud) is the crucial unit in all 
code signals; for example, there are three bauds to 
a dash, each intracharacter space is one baud in 
length, and, of course, there is one baud in a dot. 
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(one to five) varied from item to item in a 
randomly determined order. In each item, 
the number of “dots” always preceded the 
“dashes” (of course, in an item with five 
“dots” there were no “dashes’’). 

In the Code Distraction Test the stimulus 
groups were of the same type as that used in 
the Dot Perception Test (five-signal groups). 
This was to minimize learning and to empha- 
size the attention aspects of the task. The 
distraction used was taken from regular prac- 
tice code tapes which contain actual code sent 
at a rapid rate. These practice tapes also in- 
clude other background sounds at odd in- 
tervals, e.g., a door slamming, the murmur of 
conversation, the occasional sound of outside 
traffic. The actual procedure used in record- 
ing this test was to play the stimulus items 
from one recorder and the background dis- 
traction from another recorder, onto a third 
tape. During the earlier part of the test 
(first 25 items), the distraction was more 
distinguishable from the stimulus items (not 
as intense and at a higher frequency). As 
the test progressed through the 150 items, an 
attempt was made to increase the “distrac- 
tion.” This was done by increasing the in- 
tensity of the distraction, and later by fluc- 
tuating the intensity and frequency of the 
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distraction signals. Short, unpredictable in- 
terruptions of the background signals ap- 
peared particularly disturbing. Before the 
test, Ss were given examples of the relevant 
stimuli, examples of the distraction stimuli 
alone, and examples of both kinds of signals 
together. Five practice items were provided. 
Subjects were told there would be no syste- 
matic pauses in the background signals, but 
as in the previous test there would be slightly 
longer pauses in the stimulus items after every 
ten items. Complete dialogue for the Dot 
Perception Test and the Code Distraction 
Test may be found elsewhere (2). 

In addition to the eight tests already de- 
scribed, two previously standardized tests 
were included in the analysis. One of these 
was the Army Radio Code Test (ARC), 
which is designed to measure the speed with 
which the examinee can learn certain code 
characters. The other was the Signal Corps 
Code Aptitude Test (SCCAT), which is a 
test of ability to perceive differences between 
code patterns. It was also possible to com- 
pare the predictions achieved by these aural 
tests with that achieved by the Radio Op- 
erator Aptitude Index (ROAT), which at that 
time was a weighted index based solely on 
printed tests. 


Table 1 
Summary Data on the Experimental Radio Operator Variables Based on an Unrestricted 
Sample of Basic Trainee Airmen 
(N = 243) 





Approximate 


No. of 
Test 
Test Items 


Administration 
Time 
(Minutes) 


Mean* 


Reliability** 





. Pitch 50 
. Loudness 50 
. Rhythm 30 
. Time 50 
. Timbre 50 
. Tonal Memory 30 
. Dot Perception 50 
. Code Distraction 

. ARC 

. SCCAT 

. ROAI 


RFP OO ON AU FP WH 


_— 


7.5 — 19° 86 
7.5 36.65 ; 63 


25.05 ; 75 
38.56 : 73 
38.82 ; 79 
19.08 % 88 
36.71 95 
107.76 97 
60.92 .98 
51.17 .74 
5.19 1.99 





* Scoring formula for variables 1 through 8 and for variable 10 was ‘‘number of rights’; variable 9 is scored R — 4% W; 


variable 11 is a weighted index based on six printed tests. 


** Split-half reliability estimates corrected for full length of each test. 





Predicting Code Proficiency of Radiotelegraphers 


Administration to Basic Trainee Airmen 


The preliminary versions of the experi- 
mental tests had been tried out on groups of 
airmen before casting them in the forms de- 
scribed above. Then, after the tests were 
recorded (as described), they were adminis- 
tered to 243 basic trainee airmen, using loud- 
speaker administration. The tests were ad- 
ministered to groups not exceeding 50 Ss at a 
time. 

Table 1 summarizes the means, standard 
deviations, and reliabilities obtained for the 
various tests, as well as the approximate ad- 
ministration times required for each test. 
The administration times include time re- 
quired for standard instruction, demonstra- 
tions, and practice periods. 

The data for ARC and SCCAT are com- 
parable to those generally obtained for un- 
restricted samples. The split-half reliability 
coefficients obtained for these two tests are 
also consistent with previous indications of 
their reliabilities. 

It appeared that the reliabilities of certain 
of the Seashore adaptations could be im- 
proved, but that the reliabilities were ade- 
quate for the present purposes. The reli- 
abilities of these tests are in the same general 
range as that obtained in the original stand- 
ardization based on adult samples (5). Of 
special interest are the high reliabilities evi- 
denced by the new experimental tests, Dot 
Perception and Code Distraction. Compara- 
tive split-half reliabilities of the ARC and 
SCCAT tests confirm previous indications 
based on test-retest reliability that the ARC 
is substantially more reliable than the SCCAT 
test. 


Administration to Radio Operator Trainees 


Procedure. The complete battery was adminis- 
tered to 400 entering radio operator trainees at 
Keesler Air Force Base, before any of the students 
had actually started the course. To the examinees 
the testing seemed to be part of school evaluation 
procedures and motivation appeared to be high. 
The number of examinees per session averaged 
around 70. In this phase of the study all the tests 
were administered through earphone headsets. The 
testing room was equipped with headsets for each 
student. Sound from the testing tapes was piped 
from the recorder through an amplifier to the in- 
dividual earphones. 

The tests were administered in the following or- 
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der: 1, Pitch; 2, Loudness; 3, Rhythm; 4, Time; 5, 
Timbre; 6, Tonal Memory; (10-min. break) ; 7, Dot 
Perception; 8, Code Distraction; 9, ARC. All stu- 
dents had received the SCCAT test previous to as- 
signment. 

The sample. At the time of the study, all stu- 
dents were selected for assignment to Radio Operator 
Training on the basis of their Radio Operator Apti- 
tude Index (ROAI), and their scores on the SCCAT. 
Thus, a double cut-off selection procedure was in 
effect. An airman had to achieve a minimum score 
on the ROAI as well as a minimum score on the 
SCCAT to qualify for the course.* The sample, 
therefore, was subject to considerable curtailment. 

The criterion. The criterion of success in the 
Radio Operator Course is relatively unambiguous. 
Most of the attrition rate is associated with lack of 
progress in learning to receive International Morse 
Code. Very few students are eliminated for in- 
ability to send code. Periodic checks on receiving 
speed of students are made under relatively stand- 
ard conditions and the receiving speeds attained re- 
ceive the major weight for elimination or washback 
recommendations. The criterion of number of code 
groups received after certain amounts of training 
represents an objective and continuously distributed 
measure of proficiency. 

The primary criterion selected for the present study 
was the number of code groups achieved by the stu- 
dents tested after 14 actual academic weeks of train- 
ing. It should be pointed out that this stage of 
training represents a somewhat more advanced cri- 
terion of proficiency than has generally been used in 
most previous radio operator validity studies. The 
fourteenth week of training is the stage at which 
students who have not achieved a receiving speed of 
at least 10 groups per minute are recommended for 
elimination. As it turned out, the mean code speed 
attained by the current sample at the end of this 
time was 12.1 groups per minute with a standard 
deviation of 4.7, although some of the students 
reached as much as 25 groups per minute. 


Results 


As indicated earlier, the present sample of 
trainees was subject to considerable curtail- 
ment due to previous selection for the course 
on the basis of the ROAI and SCCAT. This 
condition has the effect of artificially reduc- 
ing the size of the intercorrelations among 
the tests as well as the validity coefficients 


obtained. If appropriate statistical correc- 
tions are not applied, the tests may appear to 
be less valid than they actually are. What 
we are really interested in is the utility of 
these tests for selecting students from an un- 


*This procedure has since been changed. The 
ARC Test has replaced the SCCAT and is weighted 
into the ROAI, yielding a single cut-off score. 
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Table 2 


Intercorrelations* Among the Radio Operator Variables 
(N = 400) 








Variable 1 3 4 7 8 10 11 
. Pitch — 09 24 18 26 36 17 16 
04 





. Loudness 15 02 08 04 07 04 09 
. Rhythm 27 — 28 29 40 23 01 
. Time 21 10 29 -- 20 28 10 04 
. Timbre 18 49 23 18 08 13 04 

. Tonal Memory 51 08 32 18 13 32 27 22 
. Dot Perception 35 10 31 23 — 68 09 17 
. Code Distraction 45 13 42 31 72 — 15 
. ARC 44 26 25 24 43 48 27 22 
. SCCAT 32 11 30 16 7 45 22 37 f = 08 
. ROAI 39 20 09 12 50 37 38 36 —- 








* Values above the diagonal are the obtained (restricted) correlations; values below the diagonal are corrected for restriction 


of range. Decimals are omitted. 

restricted sample of basic trainee airmen. 
Consequently, the correlations obtained among 
the variables were all corrected for double re- 
striction (due to selection on the basis of the 
ROAI and SCCAT tests), using the matrix 
multivariate restriction formulas described by 
Gulliksen (4) and by Thorndike (8). 

Table 2 presents the intercorrelations among 
the experimental tests, the ARC, SCCAT, and 
ROAI. Values above the diagonal are the 
obtained restricted correlations; values be- 
low the diagonal are the “corrected” (unre- 
stricted) correlations. 

Table 3 presents the means and standard 
deviations of each of the variables as well as 


the correlation between each test variable 
and the criterion of success (groups per min- 
ute received at the end of 14 weeks). The 
uncorrected as well as the corrected validity 
coefficients are presented. 

The validity coefficients in Table 3 are all 
significant beyond the 1% level of confidence, 
although some are of rather low magnitude. 
For predicting receiving speed at this level of 
proficiency, the ARC Test emerges as the 
most valid single test. The printed composite 
(since revised) apparently had only a low 
positive validity for predicting this particular 
criterion. Of the experimental tests, the Code 
Distraction, Dot Perception, and Rhythm 


Table 3 


Means, Standard Deviations, Validities, and Beta Weights of Radio Operator Variables 


(V= 


400) 











Variable Mean 


Validity 
(Uncorrected) (Corrected) 


Validity Beta 


SD Weights 





37.56 
33.22 
26.93 
41.32 
43.64 
22.31 
45.10 
129.81 
79.15 
56.88 
6.73 


. Pitch 

. Loudness 

. Rhythm 

Time 

Timbre 

. Tonal Memory 
. Dot Perception 
. Code Distraction 
ARC 

. SCCAT 

. ROAI 


RP SOCMHNIANPWHe 


—_ 


6.63 12 .20 — .099 
7.38 14 17 .039 
31 a 173 
18 ‘ .030 
.20 : .067 
18 F 012 
25 , .030 
32 , .130 
37 ‘ .288 
.23 ‘ 167 
07 . — .024 








Note.—Miultiple R of complete battery = .537. 





Predicting Code Proficiency of Radiotelegraphers 


Tests appear to predict individually at about 
the same level or better than the Signal Corps 
Code Aptitude Test. 

Of perhaps greater importance than the in- 
dividual test validities are the unique con- 
tributions made by the individual tests to- 
ward predicting the criterion when the tests 
are used in combination. The problem re- 
solves itself into finding the best combination 
of tests that yield the highest combined pre- 
diction. The last column of Table 3 pre- 
sents the beta weights for the individual tests 
in the complete battery. A multiple R of 
.537 was achieved by the tests including the 
ARC, SCCAT, and ROAI. Some of the vari- 
ables are contributing very little to this pre- 
diction, however. The ARC contributes al- 
most twice as much as any other single 
variable. The printed-test composite added 
nothing to the over-all predictive value of the 
battery, and this was true for all combina- 
tions which also included the ARC Test.® 
More complete multiple correlational analysis 
of these data (see 2) indicated that the ARC, 
Rhythm Discrimination, and Code Distrac- 
tion Tests yield the optimum prediction pos- 
sible with the fewest number of tests (R = 
.513). Inclusion of additional tests provided 
increments so slight that they might be ex- 
pected to occur by chance simply from the 
increase in the number of parameters in- 
cluded in the prediction equation. 


Summary 


This report has described a preliminary 
study designed to evaluate the utility of cer- 
tain auditory-perceptual tests for prediction 
of radio operator success in training. The 
tests evaluated included measures of (a) pitch 
discrimination; (b) loudness discrimination; 
(c) rhythm discrimination; (d) time dis- 
crimination; (e) timbre discrimination; (f/f) 
tonal memory; two new experimental tests 
called (g) Dot Perception, and (4) Code 
Distraction; (i) the ARC; and (j) the 
SCCAT Tests. <A_ printed-test composite 
score was also included in the analysis. The 


5It should be stressed that the printed-test com- 
posite has traditionally been validated against final 
school grade, which includes academic-type achieve- 
ment as well as code achievement. Against this cri- 
terion, the printed ROAI test possessed considerably 
higher validity (3). 


criterion of success was speed of code recep- 
tion (number of groups per minute) at the 
end of the fourteenth week of training. Some 
of the primary results indicate: 

1. All the validity coefficients were signifi- 
cant beyond the 1% level of confidence, but 
some were of rather low magnitude. 

2. The best single predictor turned out to 
be the Army Radio Code Test (r = .44). 

3. Three of the other experimental tests 
yielded approximately equal or better indi- 
vidual predictions than the Signal Corps Code 
Aptitude Test (ry = .33). These tests were 
the Code Distraction (r = .38), Dot Percep- 
tion (r = .31), and Rhythm Discrimination 
(r = .34) Tests. 

4. A combination of the Army Radio Code, 
Rhythm Discrimination, and Code Distrac- 
tion Tests yielded a multiple correlation with 
the criterion of .513 which for practical pur- 
poses was comparable to that achieved by 
the total battery. 

5. The results also suggest that aural tests 
are likely to achieve a degree of prediction of 
later code proficiency not possible by printed 
tests alone. 
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The Relationship Between Rifle Steadiness and Rifle Marks- 
manship and the Effect of Rifle Training on Rifle 
Steadiness 
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The results of research agree in describing 
the ataxiameter test of rifle steadiness as a 
reliable instrument, valuable for the predic- 
tion of marksmanship performance. Seashore 
and Adams (4) reported a reliability coeffi- 
cient of .89, and found that six members of 
a university rifle team were steadier than an 
unselected group of 50 male students enrolled 
in military drill classes. Humphreys, Bux- 
ton, and Taylor (2) reported a reliability of 
.94 (estimated, Spearman-Brown), and found 
that the steadiness scores of men in a rifle 
team corresponded highly (r = .72) with the 
ratings given the men by the coach of the 
team. Spaeth and Dunham (5), using an 
arm-and-stylus test, found a rho of .61 be- 
tween their measure of steadiness and the 
rifle marksmanship scores for 73 marksmen 
at Fort Meade. 

Several experimenters (1, 2, 4, 5) have ex- 
pressed interest in whether or not rifle train- 
ing affects rifle steadiness. Only one attempt 
to resolve this question, however, has been 
alluded to. Humphreys, Buxton, and Taylor 
(2), citing an unpublished paper by Belton, 
Blair, and Humphreys, state that the rela- 
tively long period of training in rifle firing 
employed in that study produced little im- 
provement in rifle steadiness. 

The aims of the present study were (a) to 
estimate the reliability of an ataxiameter test 

1 Human Research Unit No. 1, CONARC, is es- 
tablished under the command of the Commanding 
General, Headquarters Continental Army Command. 
The Human Resources Research Office, the George 
Washington University, operating under contract 
with the Department of the Army, employs the Di- 
rector of Research and other civilian staff members 
who are assigned to the Unit with the approval 
of Commanding General, Headquarters Continental 
Army Command. The Human Resources Research 
Office provides the Unit with technical supervision 
in the planning and analysis of the research projects. 

Conclusions stated herein do not necessarily rep- 
resent the official opinion or policy of Commanding 


General, Headquarters Continental Army Command, 
or the Department of the Army. 


designed to measure rifle steadiness, (b) to 
estimate on the basis of the measurements 
obtained with this test, the relationship be- 
tween rifle steadiness and rifle marksman- 
ship, and (c) to determine the effect of rifle 
training on rifle steadiness. 


Method 


The data for the study were obtained in the course 
of a larger experiment in which a Whole Method of 
Army rifle marksmanship instruction was compared 
with the method in current use, which is essentially 
a Part Method (3). Table 1 shows the experimental 


Table 1 


Condensed Experimental Design for the Study in 
Which the Present Data Were Collected 








Group Group Group Group 
Phase I II Ill IV 


1 Rifle Steadiness Test—All Groups 
2 (Training)* Method Method Method No 

A B Cc Training 
3 Rifle Steadiness Retest—All Groups 
4 (Criterion) Rifle Firing—All Groups 








* For Groups I, II, and III, basic training for this two-week 
period consisted of 28 hours of preliminary rifle instruction 
interspersed with training unrelated to weapons. Group IV 
instruction, during this same time, was entirely unrelated to 
weapons. 


design of the larger experiment, which was repli- 
cated twice, each time at a different military instal- 
lation. Rifie steadiness data were gathered at two 
separate stages of the experiment—once before the 
experiment proper and once immediately before cri- 
terion firing. 

Subjects. The first replication of this study, un- 
dertaken at Fort Knox, Kentucky, involved the use 
of 148 subjects (Ss). The study was repeated at 
Fort Jackson, South Carolina, with 200 Ss. For 
both replications, Ss were male, Light Infantry basic 
trainees, with “A” physical profiles (physically fit 
for combat duty). For the Fort Knox replication, 
the average intelligence of Ss was 90.8 (SD = 28.0), 
as measured by Army Classification Battery Aptitude 
Area I scores. The mean number of years of educa- 
tion was 10.5 (SD =3.7; Median=8.5). For the 
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Rifle Steadiness and Rifle Marksmanship 




















Fic. 1. The rifle ataxiameter. 


Fort Jackson replication, average intelligence was 
86.3 (SD = 18.4); mean number of years of educa- 
tion, 9.4 (SD =2.5; Median = 8.4). 

The steadiness apparatus. The ataxiameter used 
in this study, diagrammed in Fig. 1, is patterned after 
Humphreys’ (2) modification of an apparatus used 
by Seashore and Adams (4). It consists of a light 
wooden frame, 12 X 12 in., fixed vertically on a 2 X 
4 X 60-in. horizontal board beneath which are at- 
tached four work adders. Within this wooden 
frame, a small peg is suspended on four threads, 
two in the horizontal plane and two in the vertical 
plane. Each thread is directed by means of pulleys 
to one of the work adders. The apparatus is ad- 
justed to S’s height as he aims the rifle at a small 
bull’s-eye placed on a wall about 15 ft. away. As the 
muzzle of S’s rifle sways within the wooden frame, 
the corresponding movement of the threads attached 
to the peg turns the dials of the work adders, record- 
ing movement in each of the four directions, up, 
down, right and left. A switch, which activates a 
dial-locking bar, permits recording at will. 

Steadiness test procedure. At the beginning of 
each test series, S was given the following instruc- 
tions: 

“This is a test to see how steady you can hold a 
rifle. I want you to line up the bull’s-eye on the 
other side of the room through your sights (tester 
demonstrates and hands rifle to trainee). Hold it 
as steady as you can until you are told to stop 
(tester allows S to get into position—about three 
seconds). Ready? Start.” 

The test, as given in this experiment, employed 
three testers: one to give S his instructions, one to 
manipulate the apparatus, and one to record data. 
The second tester turned the switch to the recording 
position simultaneously with the word, “Start.” At 
the end of a 15-sec. interval (timed with a stop- 
watch), the second tester turned the switch to the 
“off” position, told S to stop work and to relax, 
gave the readings on the work adders to the third 
tester, and reset the dials to zero. After a lapse of 
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about 30 sec., the first tester instructed the trainee 
again as follows: “Now we will try it once more. 
Remember to hold it as steady as you can until you 
are told to stop. Ready? Start.” 

The test was finished at the end of this second 
15-sec. trial. The S’s score on the test was the sum 
of the dial readings for both of the 15-sec. trials. 
As previously stated, the ataxiameter test was ad- 
ministered twice in each replication of the study in 
which the present data were obtained. 


Reliability of the Rifle Steadiness Test 


A split-half reliability was estimated for 
each administration of the ataxiameter test 
by correlating the summed dial readings of 
the two 15-sec. trials and correcting the re- 
sult by the Spearman-Brown formula. In 
this manner, the reliability of the initial test 
(Phase 1 in Table 1) was estimated, for the 
Fort Knox replication, as .82, for the Fort 
Jackson replication, .87. Similarly, the re- 
liability of the test administered prior to cri- 
terion firing (Phase 3 in Table 1) was found 
to be, in the first replication, .82; repeated, 
.88. Test-retest teliability (Phases 1 and 3 
in Table 1) for the first replication was .65, 
for the second, .45. 


Marksmanship Criteria 


Data for the marksmanship criteria were obtained 
in four days of M1 Rifle firing on an Army rifle 
range. A detailed discussion of the procedures fol- 
lowed in this firing may be found in the report of 
the experiment during which the present data were 
obtained (3). Briefly, each S fired a total of 100 
shots in slow fire and 72 shots in sustained (rapid) 
fire at distances from the target of 100, 200, 300, 
and 500 yd. 

In a slow fire exercise, the trainee has an un- 
limited time between shots and is given knowledge 
of his accuracy after each shot. In a sustained fire 
exercise, on the other hand, he has 50 sec. in which 
to fire nine times and change a clip of ammunition. 
He does not receive information about his accuracy 
until the end of the exercise. 

The criterion firing consisted essentially of three 
trials, each containing the same set of firing exer- 
cises (minor differences were corrected for in the 
analysis). The average reliabilities of the slow and 
the sustained fire criteria were estimated by obtain- 
ing the average intertrial correlation and applying 
the Spearman-Brown formula with »=3. In this 
manner, the reliability of the slow fire criterion was 
estimated, for the Fort Knox replication, as .88, for 
the Fort Jackson replication, as .84. The reliability 
of the sustained fire criterion was similarly esti- 
mated for the Fort Knox replication, as .83, for the 
Fort Jackson replication, as .81. 
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Table 2 


Retest 


Replication 


—.22** 148 
—.29** 200 


1. Fort Knox 
2. Fort Jackson 





** Significant at the .01 level. 


Rifle Steadiness and Rifle Marksmanship 


Table 2 shows the correlations obtained be- 
tween the steadiness and marksmanship scores 
for both replications of the study. Since the 
four groups within each replication differed 
significantly with respect to their marksman- 
ship scores, the correlations reported here 
were obtained by means of an analysis of 
covariance, in which deviations were taken 
about the means of the individual groups 
rather than about the grand mean. Each co- 
efficient, then, is a within-groups r (negative 
because the ataxiameter measures wnsteadi- 
ness). 

There appears to be no question that rifle 
steadiness is significantly related to slow fire, 
since each of the correlations is significant 
beyond the .01 level. Nevertheless, the av- 
erage correlation is much lower than the cor- 
relation of .72 obtained in the -Humphreys, 
Buxton, and Taylor (2) experiment. The 
correlation between steadiness and sustained 
fire appears to be still lower.’ 

There is, of course, the possibility that the 
strength of the relationship found in the 
Humphreys, Buxton, and Taylor (2) study 
may be reconciled with the much lower re- 
lationship found in the present study. The 
studies sample quite different populations. 


2 Further evidence of a low relationship between 
rifle steadiness and rifle marksmanship was obtained 
at Fort Knox in the course of a pilot study on the 
problem of flinching (an anticipatory response to the 
loud report of the weapon). For 60 Ss, rifle steadi- 
ness was measured by one 15-second trial with the 
ataxiameter; rifle marksmanship, by two days of fir- 
ing on an Army rifle range in which each S fired a 
total of 116 shots in slow and sustained fire exercises. 
For slow fire, the correlation between rifle steadiness 
and rifle marksmanship was — .18, for sustained fire, 
= 22. 


F 


- 23** 


N 


137 
199 


The former study dealt with highly skilled 
rifle team members; the latter, with relatively 
unskilled marksmen. If it should be shown 
that as marksmanship skill improves, rifle 
steadiness becomes a more and more impor- 
tant index for discriminating between marks- 
men, the apparently discrepant results might 
be seen as complementary. Although tests 
made with the data of the present study, in 
which the relationship between steadiness and 
marksmanship was compared for several lev- 
els of marksmanship skill, failed to yield evi- 
dence of such a trend, the levels attained by 
Ss of this study may be sufficiently below 
asymptote not to contribute to a fair test 
of the hypothesis.. Further research on the 
problem may clarify this issue. 


Rifle Training and Rifle Steadiness 


The results of the present study provide 
evidence that rifle training does not affect 
rifle steadiness. 

In the experiment which provided the pres- 
ent data, three of the four groups were given 
28 hours of Army rifle instruction prior to 
their criterion firing. This training included 
a considerable amount of handling, aiming, 
and firing of the rifle. A no-training group, 
on the other hand, had no access to weapons 
during this period. If rifle training affects 
rifle steadiness, one might expect a significant 
decrease (since the ataxiameter measures un- 
steadiness) in ataxiameter scores from test to 
retest for the trained groups in relation to 
whatever changes take place with the un- 
trained group. 

The analysis of covariance was used to 
equate the groups in terms of the initial test 
(Phase 1 in Table 1). This analysis showed 
that no significant change in the rifle steadi- 
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Table 3 
Relative Changes in Steadiness of Trained and 
Untrained Groups 


Adjusted 
F 


Retest 


Statistic Test 


Fort Knox Replication 


Group 


Mean 235.39 194.26 
Trained Groups SD 63.58 73.45 
N 104 104 


Mean 226.61 170.33 
SD 74.10 47.88 
N 33 33 


Untrained Group 


Fort Jackson Replication 





Mean 197.86 = 188.72 
Trained Groups SD 63.21 63.27 
N 149 149 


Mean 203.00 219.42 
Untrained Group SD 62.42 100.38 
N 50 50 


ness of the trained groups in relation to that 
of the untrained group (F = 1.48) had taken 
place at the time of the retest for the Fort 
Knox replication, as shown in Table 3. 

A significant difference (F = 6.50) was ob- 
tained with the data of the Fort Jackson 
replication. On closer examination, however, 
it was found that changes had taken place in 
the steadiness of both groups, and that the 
changes were opposed. When the significance 
of the change in steadiness of the trained 
groups was tested, a ¢ of 1.70 (df = 148) 
was obtained; for the change in steadiness of 
the untrained group, a ¢ of 1.20 (df = 49) 
was obtained. Since neither of these changes 
can be called significant, it may be concluded 
that the significant F obtained here was the 
product of opposed, random fluctuations, 
neither of which was significant in itself. (It 
may also be noted that the mean difference 
for the groups of the Fort Knox replication is 
opposed to that of the Fort Jackson replica- 
tion.) 

According to the data for the two replica- 
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tions, no evidence has been found that rifle 
training affects rifle steadiness. This find- 
ing supports the unpublished evidence cited 
by Humphreys, Buxton, and “Taylor (2). 


Summary 


The aims of the present study were (a) to 
estimate the reliability of an ataxiameter test 
of rifle steadiness, (6) to estimate the rela- 
tionship between rifle steadiness and rifle 
marksmanship, and (c) to determine the ef- 
fect of rifle training on rifle steadiness. 

The study was replicated twice, each time 
at a different military installation, once with 
148 Ss, once with 200 Ss. Target scores were 
used as criterion data. 

This study agrees with previous studies in 
finding the rifle ataxiameter test to be a re- 
liable instrument. It fails, however, to find 
as high a relationship (.72; .61) between 
steadiness and marksmanship as the other 
studies reported. The present study finds 
the relationship between rifle steadiness and 
rifle marksmanship to be about — .24 for 
slow fire, and generally insignificant (al- 
though consistent in sign) for sustained 
(rapid) fire (the coefficient is negative be- 
cause the test actually measures unsteadi- 
ness). No evidence is found that rifle train- 
ing affects rifle steadiness. 


Received June 23, 1954. 
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The Problem 


The problem is twofold. It deals with the 
evaluation of a simple color-naming test for 
color blindness, using the Eastman Color 
Temperature Meter, and the comparison of 
the latter and three other color-vision tests. 


Apparatus 


The apparatus consisted of four color- 
vision tests: (a) The Eastman Color Tem- 
perature Meter, (b) the Farnsworth-Munsell 
100-Hue Test for Color Vision, (c) the 
American Optical Company Pseudo-Isochro- 
matic Plates for Testing Color Perception, 
and (d) the Freeman _ [Illuminant-Stable 
Color Vision Test.t Hereinafter, these tests 
will be referred to as the Eastman, the Farns- 
worth, the American Optical, and the Free- 
man, respectively. 


The Eastman as used in this experiment 
involves a point source of light from a 100-w., 
frosted, incandescent lamp. This point source 
of light was obtained by mounting the meter 
on one side of a box with a small hole at the 
opposite side through which the light could 


shine. The meter has a bipartite field, the 
right side being relatively constant in color, 
and the left side being variable in color 
through the rotation of colored discs within 
the meter by the turning of a dial. The sub- 
ject is required to name the colors perceived 
at both the right and the left fields for each 
of the three settings of the dial, i.e., six colors. 
The settings that were chosen were 2,500, 
5,000, and 50,000° Kelvin (K).*? They were 


1 After the completion of this experiment, a new 
and presumably improved edition of the Freeman 
test, consisting of merely six plates, appeared on the 
market. 

2In order to provide a standard reference scale for 
colors of illuminants, color temperature has been 
used. In this system the color and the spectral dis- 
tribution curve of the source in question are matched 
with those of a standard black body heated to a 
given absolute temperature. When such a black 


chosen because at 2,500° K there is a maxi- 
mum amount of green in the left field, at 
50,000° K there is a maximum amount of 
red in the left field, and at 5,000° K both 
fields are approximately alike for the experi- 
menters. 

The Farnsworth test consists of standard 
Munsell colors (10) which are mounted in 
plastic caps. There are 85 caps divided into 
four series, each series having the first and 
last cap of that series permanently affixed, and 
the other caps of each series being movable. 
The movable caps are randomly placed by 
the experimenter and the subject attempts to 
align them in a graduated color sequence 
from the first stationary cap to the last sta- 
tionary cap in each of the four series. Num- 
bers are printed on the underside of the caps 
indicating their proper sequence, thereby en- 
abling the experimenter to score quickly the 
number of misplacements and the degree of 
misplacement. The results are plotted on a 
special graph which indicates the degree and 
type of color deficiency. 

The American Optical test consists of 18 
pseudoisochromatic plates. Each plate con- 
sists of a crazed pattern of various colored 
dots. The colors of the background and fore- 
ground of each plate are based upon a con- 
fusion zone for the color-blind. The color- 
blind person generally sees no number when 
the background and the figure are composed 
of colors which the color-blind confuses. 

The Freeman test consists of 12 plates. 
This test makes use of the Wertheimer- 
Benussi Effect (the reciprocal influence of 
color and form in the perceptual organization 
of numbers on the plates) (9). The test is 
body is heated it first becomes red, then orange, 
then near white, then blue. The color temperature 
of candle light, which has a maximum of red and a 
minimum of blue light, for example, is 1,800° K 
(Kelvin). The approximate color temperatures of 
a kerosene lamp, standard daylight, and northern 


zenith skylight are 2,000, 6,500, and 13,000° K, re- 
spectively. 
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similar to the American Optical in that it 
makes use of the confusion zones. However, 
the Freeman uses form factors in order to 
confuse, whereas the American Optical uses 
crazed patterns of dots. The Freeman is 
illuminant-stable as tested over a range of 
2,400 to 10,000° K (6). 

In this study, the light used for the Ameri- 
can Optical, the Farnsworth, and the Free- 
man tests had a spectral distribution of about 
6,500° K (equivalent to standard daylight), 
as measured by the Eastman Color Tempera- 
ture Meter. The amount of illumination was 
well over the minimum requirement of 25 ft-c. 


Discussion of Background Material 


The Eastman Color Temperature Meter 
was first suggested for use as a color de- 
ficiency test by Dimmick (1) in 1942. Fol- 
lowing this suggestion, Rowland (11, 13) in- 
vestigated its possibilities as a simple anoma- 
loscopic color-vision test. 
both with and without an auxiliary Wratten 
green No. 11 filter in order that both color- 
blinds and normals could make a match. A 
match made beyond the accepted range indi- 
cated color blindness. The results with the 


Eastman meter, using a matching of two 
fields, indicated that this apparatus was quite 
good in comparison with the American Opti- 
cal and the Rabkin pseudoisochromatic tests 


(12). Making a match by use of an ap- 
paratus test, however, is time consuming. In 
addition to using a point of reversal, or 
match, Rowland used color naming for 14 
settings of the meter—seven with a filter and 
seven without. Her preliminary results indi- 
cated that the instrument might be used ef- 
fectively as a naming test for color blindness. 
However, she used 14 settings which re- 
quired considerable time for administration. 
Some preliminary work by the present in- 
vestigators led to the hypothesis that this 
device could be used effectively with merely 
three settings. 

The American Optical test has been evalu- 
ated in a number of studies (4, 5, 6, 8, 14, 15). 
Since the Hardy, Rand, and Rittler study (7) 
of 1945 has shown that the abbreviated 18- 
plate edition of the American Optical is a 
good pass-fail color-vision test, the test has 
been accepted by most students of color vi- 
sion, and is widely used. 


She used a meter ° 
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Farnsworth in his preliminary study (2), 
and in his manual of instructions (3), indi- 
cates that the Farnsworth-Munsell 100-Hue 
test successfully measures both degree and 
type of defect. Rowland (14) evaluated the 
Farnsworth test, using a large sample, and 
found the Farnsworth to be very time con- 
suming and occasionally an inadequate de- 
terminer of type and degree of defect. The 
Farnsworth has also been investigated by 
Thomas (15). Of 77 Ss used in his study, 
only 13 Ss were definitely color blind. 
Thomas’ study of the Farnsworth indicates 
that there is considerable overlap between 
color defectives and normals, and that there 
is little agreement with the other color tests 
used in his battery. He explains the disagree- 
ment by saying the Farnsworth is probably 
measuring some other attribute. The Farns- 
worth was selected for use in this study be- 
cause these contradictory findings need fur- 
ther investigation. 

The Freeman Illuminant-Stable Color Vi- 
sion Test has been subjected to only one 
validation study. Freeman and Zaccaria 
(6) made a comparison of the relative merits 
of the American Optical and the Freeman 
tests over a wide range of color temperatures 
from 2,400 to 14,000° K. The Freeman test 
gives more consistent results under these 
varying conditions than does the American 
Optical test. 

With so many color-vision tests available, 
we were interested in finding out how the re- 
sults obtained from one test differ from those 
of other tests. The present experiment was 
designed to answer such a question with re- 
gard to the four color-vision tests discussed 
above. 


Method 


Subjects. Two samples of young men were used 
in this study. Sample A consisted of 25 Ss—stu- 
dents and faculty of Trinity University—used in the 
development of the new test. Of these 25 Ss, 16 
“passed” every plate on both the American Optical 
and Freeman tests. Their responses to the Eastman 
were tabulated in order to obtain the normal range 
of responses (color names) given for each of the 
three settings of the Eastman. Sample B consisted 
of 100 Ss used in the validation study. Of these Ss, 
49 volunteered as normals and 51 volunteered as 
color-blinds. Most of these Ss were basic airmen 
trainees at Lackland Air Force Base, San Antonio, 
Texas. Some of the Ss who volunteered as being 
color blind were found to be color blind through a 
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Table 1 


Comparison of Four Tests for Color Blindness in Terms 
of Per Cent Disagreement and Four- 
fold Point Correlation 








Fourfold 
Point 
Corre- 
lation 


Per Cent 
Disagree- 
Test ment 





27% 46 
26% 50 
22% 55 


Farnsworth-Eastman 
Farnsworth-American Optical 
Farnsworth-Freeman 
Freeman-American Optical 12% 78 
Freeman-Eastman 11% 78 
Eastman-American Optical 5% 90 


previous screening by the USAF which used some 
plates of the American Optical test. In effect, this 
became somewhat of a pretest for some Ss. 

Procedure. Sample A Ss were given both the 
Freeman and the American Optical tests, and indi- 
cated the color on each of the two fields of the 
Eastman for each of the three settings. The re- 
sponses of the 16 Ss making perfect scores on these 
two polychromatic tests constituted the normal 
range for the Eastman test in the evaluation study. 
Sample B Ss were administered the four color-vision 
tests in accordance with the recommended testing 
conditions. Each of these Ss was classified as nor- 
mal or color blind on each of the four tests. 

For the Eastman test, anyone giving one or more 
responses outside of the range of the “normal” re- 
sponses was considered color blind. Classification of 
color blindness on the American Optical required a 
failure of at least five plates. Failure of at least 
two plates on the Freeman indicated color blind- 
ness. Failure of the Farnsworth was determined by 
graphing the results according to the specifications 
set up by Farnsworth. Comparisons of the four 
tests were made in terms of percentage disagreement, 
and fourfold point correlation. 


Results 


The results of the experiment are given in 


Table 1. The Farnsworth test shows little 
agreement with any of the other three. The 
Eastman agrees with either of the two poly- 
chromatic plate tests as well as the poly- 
chromatic plate tests agree with each other. 
As Thomas has indicated (15), possibly the 
Farnsworth is testing something else, and 
cannot be used as a pass-fail test. 


Discussion 


Aside from the effectiveness of the four 
tests as color-blindness measures, there are 
other factors that are important. These are 
the amount of training necessary for the ex- 


aminer, the effect of coaching on test per- 
formance, the stability of colors in the test 
items, and the amount of time required for 
administration and scoring. 

All four tests are relatively easy to ad- 
minister. However, the Farnsworth requires 
a more skillful examiner to score and _ in- 
terpret the results. 

During World War II, many a color-blind 
applicant for flight training was able to pass 
the American Optical by memorizing the 
background pattern for the various plates. 
For the 18 plates of the American Optical, 
there are 15 different background patterns. 
Since most of the color-blinds do not fail 
more than 15 of the plates, it means that only 
about 10 patterns have to be memorized. It 
is almost impossible for the color-blind S to 
receive successful coaching on the Farns- 
worth, since the only cues are the colors of 
the caps. As for the Eastman, as used in 
this experiment, success of coaching depends 
on whether or not the examiner varies the 
order of presentation of settings from S to S 
and interchanges the bipartite fields. Coach- 
ing is relatively unsuccessful with the Free- 
man test in view of the fact that all of the 
plates have the same basic form which com- 
prises a checkerboard design broken by sev- 
eral circles and diagonals. This is a very in- 
genious device in that only the colors vary 
from one plate to another and almost any 
numeral can be formed by setting it off from 
the background by a difference in color. The 
plates are placed in a loose-leaf binder, so 
that the order of presentation can be varied. 
Because of these factors, it is almost impos- 
sible to obtain successful coaching on the 
Freeman. 

The Freeman has been shown to be a re- 
liable test over a wide range of color tempera- 
tures (6). The Eastman portable unit con- 
tains its own light source. The American 
Optical and the Farnsworth require specific 
lighting conditions. Both of these tests re- 
quire the color temperature to be 6,500° K. 
Variation of color temperature within a room 
throughout the day seriously affects the re- 
sults of the American Optical (6). Under 
operational conditions, it is impractical to 
control adequately the lighting since the 
“testing” room is often used for other ac- 
tivities also. 

Since the American Optical uses a wide 
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range of different colors, there is a differential 
effect by bleaching and handling. Touching 
the plates with the fingers or exposing them 
to strong light alters the colors and the rela- 
tionships among the colors. When these 
changes take place, the test no longer has 
the same validity. The Farnsworth colors 
are not as subject to bleaching and the colored 
paper in the caps is somewhat protected 
from the fingers. The only thing that readily 
alters the Eastman color filters is extremely 
hot temperatures. The Freeman test uses 
only six colors which were in part selected 
because of their tendency to resist bleaching 
and the plates may be touched freely since 
they have been laminated with clear plastic 
as a protection against soiling. 

A test requiring a minimum of testing time 
and yet having considerable agreement with 
other tests of color vision is often desired 
by the test administrator. Presentation time 
of any color-vision test is not indicative of 
any greater validity or reliability except pos- 
sibly in the case of the apparatus tests. Of 
the tests used in the present experiment, the 
Farnsworth required the most time (approxi- 


mately 10 min. to present the test plus ap- 
proximately 20 min. to graph the data ob- 


tained). The American Optical took about 
4 min. for presentation. The Freeman re- 
quired about 3 min. for administration. The 
Eastman took less than a minute for presenta- 
tion. Normal Ss took less time than color- 
blind Ss on all of these tests. 


Summary and Conclusions 


The problem dealt with the evaluation of 
the Eastman Color Temperature Meter as a 
color-naming test for color blindness and with 
the comparison of the Eastman meter, the 
Farnsworth, the American Optical, and the 
Freeman tests. ’ 

The data indicated that there was fairly 
high agreement among the American Optical, 
Eastman, and Freeman tests. The naming 
test (Eastman) correlated as high with the 
two polychromatic tests as the latter cor- 
related with each other. The correlations of 
the Farnsworth with the other three tests 
were *tomewhat lower, indicating that the 
Farnsworth is apparently measuring factors 
not measured by the other tests. 

It appears that the Eastman Color Tem- 
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perature Meter has practical value as a color- 
blindness test. 

It is significant that high correlations were 
obtained between naming and color compe- 
tence. This relationship has frequently been 
unjustly ridiculed. 


Received July 14, 1954. 
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A direct test of immediate memory for the 
hue of colors has been developed to fill an 
important gap in the battery of tests already 
available for assessing personnel with respect 
to their aptitude for occupations in which 
color skills are required (6, pp. 139-144). 
There are many activities and judgments 
which depend on color memory. In fact, with 
the exception of closely juxtaposed color 
matches in small fields, all cases of color com- 
parison involve a memory element. When- 
ever a color inspector or a technical color 
tester compares a color sample in one location 
with a color standard in another location, 
color memory is necessarily exercised. The 
observer has to carry or hold the first impres- 
sion in order to decide how the second com- 
pares with it. The comparison is made between 
a perception and the mnemonic remnant of a 
previous perception. It is well known that 
mnemonic remnants fade with time, or are 
interfered with by activities intervening be- 
tween the perception and its reinstatement 
(5, pp. 465-481). Consequently the observer 
may remember the first color incorrectly, or 
he may even forget it completely. The same 
is true, of course, in the more obvious situa- 
tion where the sample and standard are not 
both physically present at once but follow 
each other in time. 

Color comparisons of this type are by no 
means limited to color inspectors and tech- 
nical testers. Color memory is relied upon 
by the artist who looks back and forth be- 
tween the colors on his canvas and those of 
the landscape, by the housewife who goes to 
purchase curtains which will match the trim 
, of her kitchen, and by the photographer who 
is trying to decide how well his color print 
matches the original scene—in a word, in all 
cases where a past color must be reproduced 
or a present color must be judged relative to 
a past color. The terms “present” and “past” 
here refer to perception rather than to mere 
physical existence. 


The test to be described evolved over a 
period of seven years through six earlier 
forms, each of which was found wanting in 
one respect or another. The main improve- 
ments have been directed toward increasing 
the reliability of the test by such means as 
restricting and controlling the testee’s task, 
controlling his visual adaptation, and in- 
creasing the homogeneity of the test items. 
The original form of the test was similar to 
one designed some time ago by Newhall and 
utilized by Smith (7) in a study of color 
discrimination. 

The test is an individual-type test which 
makes use of hue chips that have already 
been highly standardized and are commercially 
available (2). It may be administered and 
scored by nontechnical personnel after a brief 
period of training. In general, a testee is 
asked to select from a number of hue chips 
one which looks like a chip presented a short 
time before. He is scored with respect to the 
accuracy of his hue judgment. 


Test Instrument 


The test instrument consists of a wheel, shown in 
Fig. 1, on which the test chips and comparison chips 
are mounted. The instrument is enclosed in a carry- 
ing case, the top surface of which serves as the im- 
mediate visual surround, as shown in Fig. 2. The 
comparison chips consist of the 43 odd-numbered 
hues of the Farnsworth-Munsell hue series (2). 
(There are 85 hues in the full series.) The hues are 
approximately equally spaced visually and have 
nearly equal chroma and value (Munsell terms for 
saturation and lightness, respectively). Duplicates 
of 20 of these 43 hues are used as test chips, and 
duplicates of two other hues are used as practice 
test chips. The 65 chips are mounted, as shown in 
Fig. 1, in two concentric circles on the wheel which 
rotates freely on ball bearings. The outer circle 
consists of the 43 comparison chips arranged in 
spectral order to create a closed circular hue dimen- 
sion represented discontinuously by small visually 
uniform hue steps (about 2.2 Munsell hue steps per 
interval). The circular arrangement eliminates any 
visual end points to which judgment might otherwise 
be anchored. The inner circle consists of the 20 
test chips and the two practice chips mounted so as 
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Fic. 1. Test instrument with cover open to show 
arrangement of test chips (inner circle) and com- 
parison chips (outer circle). 


Fic. 2. View of testing situation with tester 
(right) and testee (left) and overhead artificial day- 
light illumination. 


to correspond radially to the comparison chips of 
which they are duplicates. This radial correspond- 
ence facilitates presentation of the proper chip and 
recording of the testee’s judgment, since a single set 
of code numbers on the bottom of the wheel can be 
used to designate both test chips and comparison 
chips. These (inverted) code numbers are easily 
read by the tester with the aid of a mirror ap- 
propriately positioned on the base beneath the wheel 
and not visible to the testee. The mirror and code 
numbers are indicated in Section AA of Fig. 3. 

A hinged, opaque cover fits over the wheel of hue 
chips, and the handle for rotating the wheel pro- 
trudes through the center of the cover (Fig. 1 and 2). 
Near the edge of the cover, at the testee’s viewing 
position, is a thin, opaque sliding panel into which 
two circular apertures are cut. This panel may be 
seen in Fig. 3. The apertures are located so that 
either a test chip or a comparison chip may be 
viewed clearly, with no disturbing shadows, when 
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the panel is appropriately moved by means of either 
of two spindles which project through a cam slot 
in the surface of the cover. A small, black block is 
provided for covering the exposure apertures while 
a test chip is being set in position by the tester. 
This block is shown in Fig. 1 and 2 on the table 
next to the test instrument. 

Diametrically opposite the testee’s viewing position 
is the tester’s position. A transparent window is 
located at that position (see Fig. 3) on the vertical 
cylindrical support for the cover, through which the 
tester may observe the code numbers corresponding 
to the particular test or comparison chip being 
viewed by the testee. A simple, spring-loaded, fric- 
tion brake (see Fig. 3) is located next to the window 
so the tester may reduce or prevent movement of 
the wheel while positioning a test chip, or hold it in 
position while the testee views the chip. 

The Farnsworth-Munsell hue chips 
Munsell papers mounted in black 


consist of 
plastic caps. 


These caps are set on short posts which project above 


the wheel. When the cover is closed the chips can- 
not fall off the posts, but when it is opened they 
may easily be lifted off the posts. This is done so 
that the 43 comparison chips can be used either in 
this test or in the original Farnsworth-Munsell test 
of hue discrimination (2). It is necessary to add 
only 22 duplicate hues for the test chips and practice 
chips in the present test of hue memory. 

The hue chips are designated by the code numbers 
used in the Farnsworth-Munsell series. The numbers 
of the comparison colors are 1, 3, 5, 7, 9, etc., to 85. 
The practice chips are numbers 1 and 43, and the 
test chips are “every other” odd number from 5 to 
41 and from 47 to 83. 

The cover of the test instrument may be locked 
in position by a spring clip, and the instrument can 
then easily be transported by means of a handle 
fastened to the cylindrical cover support (see Fig. 1). 
An aperture cover is provided which fits over the 
test and comparison chip apertures when the instru- 
ment is carried or stored. 

Since the testing instrument is not available com- 
mercially,! dimensions of the parts are indicated in 
Fig. 3, so that it may be duplicated. Most of the 
parts are made from Lucite and all surfaces except 
the hue chips and the window are covered with a 
low-reflecting, flat black paint (about Munsell N 3/). 


Test Administration 


The testee is seated comfortably at a table with 
the testing device placed so that the sliding panel 
used to expose the colors is directly in front of him 
(see Fig. 2). The tester may sit either directly 
opposite the testee or at one end of the same 
table. In the latter case, the test instrument is 
rotated so that the tester may clearly see the code 
numbers through the window. The test instrument 
is placed on a large sheet of middle-gray cardboard 


(about Munsell N 5/) and is illuminated from above 


1 Since the article was written, the authors have 
been notified that the test instruments may be ob- 
tained from the Munsell Color Company, 10 E. 
Franklin Street, Baltimore 2, Maryland. 
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by a Macbeth daylight unit? at 40 ft-c. The test 
has been standardized for administration only under 
average daylight illumination (6,000°K to 7,000°K). 
The Macbeth daylight unit actually used gave a good 
approximation to 6,500°K. Other illuminants should 
not be used. 

The testee is instructed as follows: 


“This is a test to find out how well you remember 
colors. I am going to expose a color here, like this, 
for five seconds and ask you to look at it with the 
idea of remembering what it looks like so you can 
pick out the color which looks most like it from a 
large group of colors. Then I shall spin the wheel 
and move the sliding panel, like this, and ask you 
to look away at the black surface of the cover until 
I say ‘now.’ At this signal, look back at the spinning 
wheel of colors and, using this knob, slow down the 
wheel, find the color which looks like the one shown 
you before and stop the wheel with that color show- 
ing in the circular hole. While you are looking at 
the black cover, do not look back at the rotating 
wheel of colors until I tell you. Then find the color 
as quickly and as accurately as possible.” 


2 Macbeth daylight units may be obtained from 
the Macbeth Corporation, P. O. Box 950, Newburgh, 
New York. 


Scale drawing of test instrument. 


The instructions are accompanied by a demon- 
stration using any test chip except practice test 
chips No. 1 and 43. While the tester is setting a 
test chip in position, the small black block described 
above is placed over the exposure apertures. This is 
removed by the tester at the moment the test chip 
is to be exposed (and replaced after the testee has 
made his judgment). At the end of the 5-sec. 
exposure, the wheel is set in rotation so no one hue 
can be seen in the test aperture, and the sliding 
panel is then moved so that the comparison aperture 
is in position for the testee. It should be made clear 
to the testee that the test hues and comparison hues 
are found in different circles of chips. This position 
difference is made clear so that no secondary clues 
such as scratches are used as a basis for judgment. 
No mention is made, however, of the possible use 
of secondary clues. A 5-sec. interval is permitted 
to elapse between exposure of the test chip and the 
time the observer begins to select the comparison 
chip. The observer stops the wheel and then 
rotates the handle to find the chip matching his 
remembered color. If the testee does not look 
steadily at the black cover during the interval 
between the exposure and the judgment, he is again 
reminded to do so. A steady gaze at the black cover 
is necessary in order to maintain control of adapta- 
tion and of “intervening activities” which could other- 
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Total (T) 
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Fic. 4. Score sheet for hue memory test. 


wise be expected to produce differential mnemonic 
interference effects. 

After this demonstration of the procedure, any 
questions the testee may have are answered, and he 
is then given practice by carrying out the full pro- 
cedure on practice test chip No. 1 and then on 
practice test chip No. 43. Questions by the testee 
at this point about “How did I do?” should not be 
answered precisely, but only with the idea of main- 
taining cooperation; for example, “You have the 
idea,” or “You are doing very well,” etc. Then, if 
there are no further questions, the test chips are 
presented in the fixed random order shown on the 
score sheet in Fig. 4. Timing can be accomplished 
by counting carefully to one’s self. For the interval 
between viewing the color and matching it, a count 
of 4 sec. is sufficient before saying “now” because the 
testee will take a second or so to slow the wheel. 

After each judgment, the tester reads the number 
of the comparison chip selected by the testee and 
records that number in the appropriate space on the 
score sheet. 

Testing time for a practiced tester runs from 15 to 
20 min. 

The test is scored by recording the arithmetic dif- 
ference (disregarding sign) between the code number 
of each comparison chip selected by the testee and 
the cede number of the test chip presented. The 
two practice chips are not scored. The score is ob- 
tained by adding these differences and dividing by 2. 
Then, for example, a score of 20 would mean that 
the testee selected chips which, on the average for 


167 


the 20 judgments, were one chip (about 2.2 Munsell 
hue steps) removed from the chip shown. 


Evaluation of the Test 


After a period of preliminary testing, the 
test was evaluated for reliability, and an item 
analysis was made to determine relative dif- 
ficulty or homogeneity of the test items. 
Evaluation for validity was not considered 
necessary since the test was designed as a 
direct measure of immediate memory for hue. 
However, inasmuch as the recently developed 
Woods’ Color Aptitude Test (8) is opera- 
tionally a test of immediate color memory, 
results on the present test were correlated 
with results on Woods’ test to determine 
whether one test could be substituted for the 
other. A similar validity coefficient was also 
determined with respect to the Inter-Society 
Color Council Color Aptitude Test (ISCC 
CAT) (4). Validation with respect to some 
occupational criterion was not undertaken, 
but could be determined in any personnel se- 
lection situation where a suitable objective 
criterion can be found. 

A group of 100 technical and clerical work- 
ers was given the hue memory test after it 
was established by the AO plate test (6, pp. 
140-141) that they had normal color vision. 
These individuals were selected because they 
varied from high to low in the amount of 
their training in technical color work. Scores 
of these individuals are arrayed in the lower 
of the two frequency distributions shown in 
Fig. 5. Retests were given from two weeks 
to a month later to 50 of these individuals. 
Those retested were selected on a basis of the 
distribution of scores on the first test by 
taking every other individual along the dis- 
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Table 1 
Test-Retest Reliability Coefficients 








Group N r Or 


68 .08 





Original 


50 
Check 30 : 07 
Original and check 
groups combined 80 .64 07 





tribution. In this way, a retest sample of 50 
was obtained which represented the distribu- 
tion of scores of the original 100 tested. The 
test-retest correlation for this group of 50 
was .68 as shown in row 1 of Table 1. This 
reliability coefficient was higher by a factor of 
2 than any obtained on earlier forms of the 
test. 

Because many of these observers had been 
given one or more of the earlier forms of the 
test, it was felt that they might have become 
more stable in their judgments than they 
would otherwise have been without such train- 


ing. It was possible that the reliability co- 
efficient could have been influenced by such 
stability. Consequently, a check group of 
30 color-normal clerical workers with no train- 
ing in color or in color-vision testing was 
tested, and retested about a week later. Fig- 
ure 5 shows the distribution of the original 
scores for these observers added to the dis- 
tribution for the first group of 100 observers. 
The test-retest correlation for this group of 
30 was .59, as shown in row 2 of Table 1. A 
statistical test of the difference in reliability 
between the two groups showed no significant 
difference in the correlations. Consequently, 
the scores for all 80 who were retested were 
combined and a test-retest correlation of .64 
was obtained (row 3, Table 1). 

The Farnsworth-Munsell chips had been 
selected for the present test as a means of im- 
proving its reliability over that of earlier 
forms. It is well known that the more all 
test items approach the same level of dif- 
ficulty the more reliable a test will be (3, p. 
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417). In the original Farnsworth-Munsell 
test, these color chips were selected because 
they approximated visually equally spaced 
hue steps at nearly constant value and 
chroma. They might then be expected to 
be about equally difficult to judge in the 
present case. 

An analysis was made of the relative dif- 
ficulty of the 20 judgments as a check on the 
homogeneity of the test items. Table 2 shows 
the distribution of response frequencies for 
each test item for the 130 testees on the first 
test taken by each. The zero column shows 
the number of testees out of 130 who selected 
the duplicate of the test chip. The columns 
on either side of the zero column show fre- 
quency of response for neighboring com- 
parison chips in both hue directions away 
from the test chip. The numbers at the head 
of the columns show the arithmetic differences 
between the code numbers for the comparison 
chips and the test chips. Code numbers for 
the test chips are listed in the left-hand 
column in hue order, not in the random order 
in which the test chips were presented. 

It may be seen that the minimum range 
over which comparison chips were selected 


was six chips (about 13 Munsell hue steps) 
as for No. 9 (red-yellow) and No. 13 (reddish 


yellow). The maximum range was 11 chips 
(about 24 Munsell hue steps) as for No. 41 
(yellowish green) and No. 55 (blue). These 
ranges show that each judgment contributed 
differentially to the total score. No judgment 
was so easy that all or most observers gave the 
same response, nor was any so difficult that 
there was pure random response over any 
considerable range. The maximum modal fre- 
quency of response was 69 out of 130 for test 
chip No. 13 (reddish yellow) and the mini- 
mum modal frequency was 29 out of 130 for 
test chip No. 37 (nearly pure green). There 
is, thus, relative homogeneity among the test 
items. 

Indications of deviations from visual equiv- 
alence of hue steps around the color circle 
are found by checking relative frequency of 
response for adjacent comparison chips. The 
higher the frequency in the zero column, as 
compared to adjoining columns, the greater 
the visual interval between the adjacent hues, 
and vice versa. On this basis the largest hue 
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intervals are found near No. 13 and the small- 
est near No. 37. 

A possible, though doubtful, hue shift in 
memory is indicated where the zero column 
has a significantly lower frequency than that 
found in an adjoining column (or where the 
mean frequency for each of several test items 
in a row falls consistently to one side of the 
theoretical [zero] mean). What seem like 
significant differences are found for test chips 
No. 17, 41, 59, and 83. Test chip No. 17 is 
a greenish yellow that showed such a fre- 
quency shift to the next chip, No. 15, which 
is a slightly reddish yellow. Test chip No. 
41 is a slightly bluish green that showed a 
shift to No. 43, which is a bluer green. Test 
chip No. 59 is a reddish blue that showed a 
shift to No. 57, which is nearly a pure blue. 
Test chip No. 83 is a bluish red that showed 
a shift to No. 81, which is a bluer bluish red. 
These “hue shifts” in memory do not show a 
consistent direction of change for which there 
is any obvious explanation, and the maximum 
shift is statistically only about 340. There 
are, apparently, no real hue shifts in this 
memory situation. Even if there were, it 
could be said with certainty that they are not 
due to any large appearance difference be- 
tween test chips and duplicate comparison 
chips. No test chip would ever be confused 
with any comparison chip except its duplicate ~~ 
by an observer with normal color vision in a 
simultaneous matching situation. 

It was expected, on the basis of visual-re- 
tention test results reported by Cohen, Welch, 
and Fisichelli (1) and results reported by 
Woods (8), that individuals with a great deal 
of training in the technical or artistic aspects 
of color would tend to score high on a test 
of this type and that those with little or no 
training would score low. An analysis of the 
scores reported here, however, indicates that 
such is not the case. 

From the original group of 100 testees, the 
scores of the 20 testees having obviously the 
most training in technical color work were 
compared with the scores of the 20 testees 
having the least or no training in color but of 
roughly comparable intelligence. The mean 
scores for these two groups were, respectively, 
19.4 and 21.7, and the corresponding o’s were 
5.7 and 6.0. The ¢ ratio is 1.3, which is not 
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significant. This result is interesting for it 
suggests that immediate memory for hue dis- 
sociated from configuration or other visual 
factors is not much affected by specific train- 
ing in color. Further support for this notion 
comes from a comparison of results on the 
present test with those on Woods’ Color Apti- 
tude Test. 

Woods’ test is similar to the present test in 
that the task performed involves immediate 
memory for color. It differs from the pres- 
ent test in that varied patterns of one or two 
colors, which vary in saturation and lightness 
as well as hue, are presented individually and, 
after an interval comparable to that in the 
present test, the testee then selects one (or 
none) of four similar patterns as being identi- 
cal in color appearance to the pattern first 
presented. Woods considers his test as one 
of color aptitude, and justifies the name on a 
basis of score differences for internally homo- 
geneous groups which vary widely in artistic 
training. Groups with much training tend to 
score significantly higher than groups with 
little or no training. This result is opposed 
to that obtained on the present test where 
training does not seem to be related to scores. 


Because of this apparent difference in results 
between the two tests, 50 of the original 100 
persons who were given the present test were 
tested on Woods’ test. Twenty-eight of these 
were given Woods’ test before taking the pres- 
ent test, and twenty-two were given Woods’ 


test after taking the present test. The cor- 
relation between the two tests* was — .23 
which becomes — .42 when corrected for at- 
tenuation. This correlation indicates low va- 
lidity of the present test as a substitute for 
Woods’ test. Alternatively expressed, this 
correlation indicates that the two tests are, at 
least in some respects, testing different things, 
which in Woods’ case may depend on special 
training, and in the present case probably 
does not. 

Incidentally, 50 individuals having widely 
varying backgrounds in terms of color experi- 
ence (28 of them common to the group of 50 
just mentioned and 22 new observers) were 


3 Negative correlation results from the mode of 
scoring. High scores on Woods’ test represent good 
performance; low scores on the present test do the 
same. 


R. W. Burnham and J. R. Clark 


tested, and retested several weeks later, on 
Woods’ test to give a test-retest correlation 
of .48 which is statistically significantly dif- 
ferent from the reliability of .64 for the pres- 
ent color memory test obtained with a com- 
parable group. Woods reported a test reli- 
ability on his test of .86 for 64 art students 
who presumably had homogeneous color back- 
grounds. The difference between his .86 for 
a homogeneous group and the .48 reported 
here for a more heterogeneous group is an 
indication that his reported reliability is not 
representative for all groups. It is, of course, 
well known that test reliability is increased 
by increasing the homogeneity of the test 
group (3, p. 418). The difference between 
the reliability of .64 on the present test and 
.48 on Woods’ test, using comparable groups, 
probably reflects differences in the tasks re- 
quired in the two tests. This difference could 
also be partially due to a difference in homo- 
geneity of test items in the two tests. 

An item analysis, comparable to the one 
made for the present test, was made of Woods’ 
test from the data for 76 testees (including 
the 72 reported above). This is shown in 
Table 3. The 25 items on Woods’ test are 
enumerated in the left-hand column, and the 
five choices for each item are designated 
across the top of the column. Frequency of 
response for each choice is shown by the 
numbers in the body of the table. It may 
be seen that these items are not as homoge- 
neous as those in the present test. For ex- 
ample, under item No. 3, choice No. 3 was 
made by all observers, whereas under item 
No. 23 all five choices were made about 
equally by the observers. There was consid- 
erable variation from item to item in the 
spread of judgments among the five choices. 
This lack of homogeneity of items probably 
contributes to the lower reliability of Woods’ 
test. 

It is entirely conceivable that differential 
learning effects (among other things) among 
testees during the first testing on either test 
would help keep the reliability «efficients 
for both tests lower thau is usually consid- 
ered desirable for a highly reliabie test. 

Differential learning eifects would be dem- 
onstrated if there were consistent inprove- 
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Table 3 


Distribution of Response Frequencies for Each Item 
on Woods’ Color Aptitude Test 
(N = 76) 


Code No. of Responses 


1 2 3 


Re — -— 
= 


nm 
Noe KS Um UI UN 


wm = 


te 


2 
z 
2: 
2 
2 


wm 


ment by all or nearly all observers on the re- 
test, but with considerable variation among 
observers in the amount of improvement. A 
check was made of the improvement on both 
the present test and Woods’ test by comput- 
ing the mean difference in score between the 
test and retest, and the standard error of 
that difference. Score differences on the hue 
memory test (V = 80) ranged from + 11 to 
— 12, the mean difference was + 1.4, and the 
standard error was 4.4. On Woods’ test (.V 
= 50), the range of the differences was from 
+5 to — 6, the mean difference was — 0.2, 
and the standard error was 2.6. These are 
roughly equivalent results for both tests and 
indicate no consistent learning effect among 
the testees. Conversely, these results indi- 
cate chance variation around a mean of zero 
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difference which may be due to a combina- 
tion of factors. 

It is possible that uncontrolled variation in 
such factors as motivation, set, and adapta- 
tion contributes to this variation in test and 
retest score and hence to a lower than de- 
sirable reliability on both tests, although an 
attempt was made in both tests to control 
such factors. In the present test an attempt 
was made to motivate the testee positively 
by creating a spontaneously pleasant (rou- 
lette) situation and by keeping the test as 
brief as possible consistent with maintaining 
reliability. Set was somewhat controlled by 
keeping the task simple and by the specific 
instructions. Adaptation was controlled by 
standardizing the illumination and surround, 
by requiring the observer to look at the black 
cover, and by reasonable control of time in- 
tervals. There was, of course, no explicit 
control over the time allotted for judgment, 
though testees were actively encouraged to 
make the judgment quickly and most of them 
discovered during the practice judgments that 
delay created confusion. 

A further, though incidental, validity check 
was made by intercomparing results on the 
present test, Woods’ Color Aptitude Test, and 
the ISCC CAT (5). The latter is a test 
of discrimination of small color differences. 
Three individuals were added to 47 who had 
already taken the first two tests to make up 
a group of 50 who were given all three tests. 
Intercorrelations of scores for this group on 
the three tests are shown in Table 4. These 
validity coefficients are corrected for attenua- 
tion using test-retest reliabilities obtained in 
the present study for the hue memory test 
and for Woods’ test, and the reliability re- 


Table 4 


Validity Coefficients (Corrected for Attenuation) Based 
on an Intercomparison of Results for Three Tests 


(N = 50) 


Woods’ 
Color 
Aptitude ISCC 

Test CAT 
Hue Memory Test — 42 — .34 
Woods’ Color Aptitude Test 39 
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ported for the ISCC CAT by Dimmick.* 
The correlations are actually all positive; the 
negative signs simply reflect differences in 
method of scoring on the hue memory test as 
compared to the other two tests. The uni- 
formly low correlations, on the other hand, 
reflect real differences among the tests and 
indicate that the three tests are largely meas- 
uring different aspects of color behavior. It 
may well be that a general aptitude for color 
work is a function of a number of such spe- 
cific factors which can only be adequately 
sampled with a battery of tests. 

It may be concluded that the reliability of 
the hue memory test reported here is suffi- 
ciently high to make the test a useful one. 
Although the test has not been validated with 
respect to an occupational criterion, it has 
been determined that it is not a substitute for 
Woods’ Color Aptitude Test, even though 
Woods’ test is operationally one of immedi- 
ate color memory. Neither the present test 
nor Woods’ test is a valid substitute for the 
ISCC CAT. Rather, these three tests in 
combination with tests of color vision might 
constitute a useful assessment of a more gen- 


4Dimmick, F. L. Personal communication, 1954. 
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eral color aptitude. By itself, the present 
test is certainly valid as a test of hue memory 
since it is a direct test of immediate memory 
for hue dissociated from the factor of con- 
figuration. 


Received May 10, 1954. 
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Difference in the meaning of the same 
words expressed by different people is a well- 
known phenomenon. Adjectives are particu- 
larly subject to misinterpretation, as is well 
known (4, pp. 141-148). What is not known 
is the magnitude of the variation in the mean- 
ing of such terms, and the efiect of these 
magnitudes on the reliability of results ob- 
tained. To what extent does the existence of 
much variation in the definition of such 
words bias the final results of a study? How 
is this bias affected when weights are used to 
combine the answers of several such classifi- 
cations in one index? 

An opportunity to obtain partial answers to 
these questions in the case of the good, fair, 
and poor trilogy occurred in a survey con- 
ducted by the Bureau of Economic and Busi- 
ness Research of the University of Illinois. 
From a research point of view, the conditions 
under which this survey were conducted were 
not as good as one would hope. However, it 
was felt that an analysis of these data would 
be of wide interest in view of the almost com- 
plete absence of information on this subject, 
and it is in this sense that the material is 
presented here. 


The Sample 


In the autumn of 1950 a mail survey was 
made of the membership of the Associated 
Credit Bureaus of America to obtain credit 
ratings of some 42 selected occupations by 
the managers of credit bureaus or of store 
credit departments. In accordance with simi- 
lar surveys on the subject conducted in 1931 
and 1941, respondents were asked to classify 
the risk rating of each occupation with which 
they had had a fair amount of experience as 
good, fair, or poor. In the present case, the 
question was raised as to the meaningfulness 
of such a classification and it was decided to 


1 The writer would like to thank, and acknowledge 
the assistance of, Miss Ramona Russell and Mrs. 
Frances Dotson for carrying out the calculations on 
which this study is based. 


ask the respondents the following query pre- 
ceding the section on risk ratings: 


How do you define good, fair, and poor 
credit risks in terms of: 


(a) per cent of DEFAULTS: good — % 
fair — % poor — % 

(b) if part (@) cannot be answered, indi- 
cate the basis of classification which 
you will use in making the occupa- 
tional ratings. 


Response to the questionnaire was not too 
good. Only 205 out of an original mailing of 
800 questionnaires were returned, an over-all 
response rate of 25 per cent. Replies defin- 
ing credit risks were received on 105 of these 
questionnaires. The response rate was the 
same for both credit bureau managers and 
store credit managers. Conditions did not 
permit any tests to be made for the presence 
of mail bias, and so no knowledge is available 
of the representativeness of the sample either 
with respect to the main content of the study 
or with respect to the definition of credit 
risks. 


Problems of Combination 


The ordinarily routine task of obtaining 
distributions by combining individual replies 
poses a perplexing problem in the present 
study for two reasons. One derives from 
the fact that some of the answers are single 
figures whereas others are given in terms of 
ranges. The question of how these two types 
of answers should be combined brings us to 
the second source of difficulty, namely: What 
do the answers mean? If a respondent de- 
fines good (g) as 5 per cent defaults and fair 
(f) as 15 per cent defaults, how will he clas- 
sify an occupation the default percentage of 
which is anywhere above 5 and below 15? 

These problems are not easily resolved. 
Basically, however, there seem to be two 
methods of approach. One is to use the origi- 
nal data as such on the ground that they are, 
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for better or for worse, the respondents’ own 
replies and that any changes in them may 
distort rather than clarify their meaning. 
The other approach is, obviously, to adjust 
the replies on the basis of some hypothesis as 
to the meaning or statistics reflected in them, 
thereby presumably improving the compar- 
ability of the definitions. In view of the 
limitations to which results obtained by 
either method would be subjected, it seemed 
advisable to base the analysis on the concur- 
rent use of both methods. This does not, of 


course, remove the limitations of each method 
individually, but it does serve to isolate re- 
sults dependent on the particular interpreta- 
tion employed from those that are not. 
Three distinct methods 
were used. They are: 


of combinations 


A. No adjustment of the original data. If a range 
was given, the corresponding point definition was 
computed as the arithmetic average of the two lim- 
its, except where a definition of » was given as X% 
and over, in which case (X + 1)% was used. 

B. Adjustment of the original data by assuming 
that a point answer for good referred to the upper 
limit of the range for the term; that for fair re- 
ferred to some measure of central tendency; and 
that for poor referred to the lower limit of the 
range for that term. The basis for these assump- 
tions was the fact that a number of answers to the 
good definition were given as a number followed in 
small letters by the words “or less,” and quite a 
few definitions of poor had the words “or more” 
written after the number. 

Although this places one definition in terms of an 
average and two in terms of limits, comparability is 
maintained by converting the two limit definitions 
into averages. For good the average is obtained as 
the arithmetic mean of the response and the lower 
limit, zero; and for poor it is the arithmetic aver- 
age of the response and the upper limit, 100.2 

C. The same adjustment of the original data as in 
“B,” except for the use of a triangular distribution 
in obtaining averages for the definitions of g and p. 
In other words, we assume that the distribution of 
cases which form the bases for the definitions are 
sharply skewed, and that the distribution curve 
slopes down linearly to zero at zero per cent de- 


°This averaging process tends to produce some 
bias in the results to the extent that (a) the true 
distributions of good and poor are skewed, and (b) 
the response to fair is not the true mean. However, 
since relatively little skewness was apparent in the 
distribution of fair, and since the same basic results 
were obtained when some of the computations were 
carried out with the responses in terms of limits, 
neither of these two factors would seem to affect 
the main findings of the study. 
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faults for g and 100 per cent defaults for p. Aver- 
ages for the definitions of g and p are obtained on 
the assumption that the ordinate of the curve is 
unity at the point given by the respondent. This 
method of adjustment is perhaps more realistic than 
the preceding one, since default percentages un- 
doubtedly do taper off toward the extremes and 
much greater weight in computing the averages is 
accorded the respondents’ own replies. 


In the following pages these three concepts 
will be distinguished by the terms actual 
values, average values, and adjusted aver- 
ages, respectively. 

The end result of these adjustments was a 
sample of 69 replies, 39 from credit bureau 
managers and 30 from store credit managers. 
These replies form the basis for this study. 
To what extent they are representative of the 
population is a moot point in view of the low 
effective rate of response secured for this pur- 
pose, only 69 out of 800. However, despite 
these shortcomings, these data provide, per- 
haps for the first time, some idea of the dis- 
persion in the meaning of adjectives used in 
a sampling study and of their possible effect 
on the results. 
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The Distribution of Meanings 


The three panels of Fig. 1 portray the dis- 
tribution of g, f, and p by each of the three 
concepts. The picture Fig. 1 presents is one 
of fairly close unanimity of opinion, surpris- 
ingly so in some ways. A pronounced con- 


centration of replies at one point is evident - 


for each of the three adjectives and for each 
concept, though the exact location of the 
peak varies according to the concept em- 
ployed—except, of course, for f, which is the 
same under all three concepts. 

As may be seen from Parts A and B of 
Table 1, however, the change in concepts 
does not greatly affect the differences that 
seem to exist in the relative uniformity of the 
definitions. The uniformity of opinion is 
greatest when the average-value concept is 
used, in which case the modal and adjoining 
intervals account for 94 per cent of the defi- 
nitions of g and slightly over 60 per cent of 
the definitions of f and p. 

As is evident from a comparison of the 
standard deviations of the various distribu- 
tions in Part C of Table 1, the relative dis- 
persion of the definitions is affected by the 
particular concept employed mainly for p. 
Particularly surprising is the much greater 
uniformity in the definition of g than of p; 
the standard deviation of the former is little 
more than a third that of the latter in all 
cases.*. The explanation for this is undoubt- 
edly the concentration of the former definition 
much closer to the end of the scale, thereby 
leaving so much less room for dispersion, a 
phenomenon that one may ascribe, if in a 
benevolent mood, to the fundamental honesty 
of most people. 

One reason why the degree of agreement on 
the definitions of g, f, and p may be higher 
than shown here is the finding of marked dif- 
ferences between the definitions according to 
the listing of the occupations on the question- 
naire. To detect and eliminate possible bias 
in the replies arising from listing the occupa- 
tions in any particular order, two forms of 
the questionnaire were used (1), one listing 
such generally considered “high” occupations 
as doctor, lawyer, teacher, business executive, 

3 The relative variability is greatest for “good,” 
but in view of the very low value of the mean of 


that distribution, absolute dispersion seems to be 
more meaningful in the present case. 


Table 1 


Selected Characteristics of Frequency Distributions 
of g, f, and p 


Actual 
Values 


Average 
Values 


{ Adjusted 
Adjective Averages 


A. Per cent of Observations in Modal Interval * 





Good 43 59 59 
Fair 32 32 32 
Poor 35 36 38 


B. Per cent of Observations in Modal and 
Neighboring Intervals * 

Good 80 94 88 

Fair 62 62 62 

Poor 36 61 48 


C. Means and SD’s of Good, Fair, and Poor 


Good X 8.8 x 5.8 


Fair X 24.2 


Poor X 


* The width of all intervals was 5 per cent. 


first, and the second form listing them last 
with laboring occupations first. Elimination 
of this bias by studying the replies to each 
form separately produced reductions in many 
of the standard deviations shown in Table 1. 
(It might be noted at this point that correc- 
tion of this bias in the sections to follow did 
not alter the nature of the results obtained 
therein.) 


Consistency 


The definitions of any two of the three 
terms studied will be said to be consistent if 
the order of the responses is the same for 
both definitions. In other words, the defi- 
nitions of, say, g and f are consistent if 


F,; = F; according as G,; = G;, 


where F; and F; are the percentage ratings of 
f for the ith and jth respondents, and G; and 
G; have corresponding meanings for g. For 
comparative purposes the effect of variability 
of the definitions is greatly minimized if the 
definitions can be shown to be consistent in 
this sense, whereas low consistency would 
indicate that people who are relatively strict 
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in defining one adjective are not necessarily 
so in defining other adjectives. In this sense, 
consistency may be interpreted as a measure 
of the internal reliability of the replies, for 
in the absence of consistency the value of the 
particular question used would seem to be 
highly doubtful. 

The test of the consistency of the defini- 
tions as thus interpreted involved tallying, 
for any two definitions, the proportion of 
time that the ith respondent’s definition of 
one term exceeds the jth respondent’s defini- 
tion for all times that the ith respondent’s 
definition of another of the terms exceeds the 
jth respondent’s definition of that term. The 
resultant statistic, doubled and then sub- 
tracted from one (which is done to allow C 
to vary between 0 and 1), may be labeled 
the consistency coefficient, or C. Thus, the 
consistency coefficient for g and f would be: 


number of times F; > F;|G; > G; 


Cou =2 ° ~ 
number of times G; > G; 
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With three adjectives, there are three such 
coefficients to be computed. The values of 
these coefficients obtained for each of the 
three concepts, shown in Table 2, would seem 
to indicate that the definitions of adjacent 
adjectives are consistent to a limited extent, 
though consistency between g and # is doubt- 
ful. When we consider that a value of .52 
for C is roughly equivalent to a value of .5 
for the coefficient of determination, the effect 
of the definition of one adjective on that of 


another adjective would not seem to be too 
high. 


Effect of Differences on Final Results 


The main object of the survey was to ob- 
tain a credit-rating percentage for each oc- 
cupation which could then serve as a basis 
for ranking the occupations in terms of credit 


Table 2 


Consistency Coefficients Between Pairs of Definitions 


Goodand Fairand Good and 
Fair Poor Poor 





Concept 





Actual values 46 42 04 
Average values PY 4 44 18 
Adjusted averages 46 42 





Robert Ferber 


risk. This was accomplished in the earlier 
study published in 1931 by more or less 
arbitrarily assigning values of 100, 60, and 
20 to g, f, and p, respectively, and obtaining 
the occupational average by weighting these 
values by the number of times each category 
was checked for that occupation. Assuming 
that the definitions of g, f, and p in terms of 
the percentage of defaults represent the true 
values for these terms (actually the comple- 
ments of the true values), what bias, if any, 
would result if the occupational credit rat- 
ings in the present study were computed by 
using the same values as before or, more 
broadly, by using various other sets of values 
that might be imputed to these terms on the 
basis of a priori reasoning? Or, considering 
the purpose of this study, to what extent 
does the use of arbitrary values for g, f, and 
p affect the credit-rating order of the various 
occupations as compared with the order ob- 
tained when the numerical definitions of these 
terms are used? 

Estimates of this bias were obtained under 
three alternate, arbitrary assumptions as to 
the values of g, f, and p. They are: 100, 60, 
20, respectively; 30, 20, 10; and a deliber- 
ately distorted set of values—90, 25, and 5. 
Occupational credit ratings were obtained for 
22 occupations, half of the total listed on the 
questionnaire, using each of these sets of 
values in turn. Rank correlations were com- 
puted between each of these sets of credit 
ratings and the complements of the ratings 
obtained when the numerical definitions of 
g, f, and p were substituted for the terms. 
Since there is no firm basis for believing one 
concept to be the “true” one, the three arbi- 
trary weighting systems were applied under 
the alternate assumption that each of the 
three concepts is the “true” one. The nine 
correlation coefficients obtained in this man- 
ner are shown in Table 3. 

This table seems to demonstrate in a con- 
clusive manner that the rank order of the oc- 
cupations in terms of credit ratings is for all 
practical purposes invariant to the system of 
values imputed to the meaning of g, f, and p. 
The fact that all of the correlation coefficients 
are so close to unity seems to imply that al- 
most any reasonable set of values—which we 
define as any set ranking f above p and g 
above f—will reveal the true rank order of 
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Table 3 
Rank Correlations Between “True” Occupational 
Ratings and Those Based on Selected 
Arbitrary Values 





Set of Arbitrary Definitions 
of g, f, and p 


Concept 


100, 60,20 30,20,10 90, 25, 5 


Actual values 995 995 .993 
Average values .998 .998 995 
Adjusted averages .998 .998 .995 


the various occupations. This finding may 
seem particularly surprising in view of the 
relatively low consistency found previously in 
this study. However, the two results are not 
contradictory. The reason for the absence of 
bias in using arbitrary weights for g, f, and p 
lies in the fairly clear distinctions between 
the meanings of these adjectives. Because of 
this distinction, combination of the individual 
ratings—which is simply a weighted average 
of the number of times an occupation receives 
g, f, and p ratings—is bound to yield much 
the same result as long as any “reasonable” 
system of values is used for the adjectives. 
Since the result is in the form of an aggre- 
gate, it is largely independent of the consist- 
ency of the ratings, which is a measure of 
individual behavior. 

In fact, these findings suggest as a tenta- 
tive hypothesis for future study that the rank 
order of a series of attributes classified ac- 
cording to any particular characteristic will 
be invariant, for all practical purposes, to any 
reasonable method of deriving the ranking as 
long as little overlap exists in the respondents’ 
thinking between the meanings of the various 
possible answers of that characteristic.* 


Conclusions 


This study has attempted to arrive at some 
idea of the variation that exists in peoples’ 
minds regarding the meanings of g, f, and p 
as used in sample surveys. In this particular 
case, the adjectives were not found to have 


4+Much the same point is made by Gulliksen (2, 
pp. 312-327) in discussing the combination of test 
scores. He points out that changes in weights will 
only alter the sets of scores obtained when the stand- 
ard deviation of the set of weights is very large in 
relation to their mean, as when positive and nega- 
tive weights are used. 
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as much variation as one might expect, and 
surprisingly little overlapping was found to 
exist between the areas over which these 
variations extended. The modal points of the 
definition of g, f, and p in terms of percent- 
age of defaults are well defined irrespective 
of the basis on which the definitions are in- 
terpreted, and the definitions are closely 
clustered around these points. 

The consistency between the definitions 
does not seem to be too high. Despite this 
low consistency, however, combination of the 
respondents’ replies yields over-all credit rat- 
ings for the various occupations, the order of 
which tends to be largely invariant of the 
method of combination used. It is suggested 
that the explanation of this phenomenon de- 
rives partly from the tendency of the defini- 
tions of g, f, and p to cover mutually exclu- 
sive areas. Substantiation of this hypothesis 
would have important consequences for prac- 
tical sampling work. 

These findings have been found to be more 
or less invariant as to which of our three in- 
terpretations have been placed on the defini- 
tions of g, f, and p, and to this extent the 
findings carry additional weight with respect 
to this particular study. However, these re- 
sults cannot easily be generalized, and the 
meanings of the gradational adjectives in par- 
ticular cannot be divorced from the context 
in which they are used. Thus, the distribu- 
tion of these adjectives might be entirely dif- 
ferent if people were asked to define g, f, and 
p with reference, say, to an election turnout. 
The distributions would probably also be al- 
tered appreciably if a different number of 
adjectives were used—if, for example, the 
adjective excellent were added to the three al- 
ready used. 


Received May 10, 1954. 
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In many questionnaires utilizing fixed re- 
sponse categories, a middle alternative is 
placed as a bridge between “satisfied” and 
“dissatisfied” categories. It may be labeled 
“undecided,” “uncertain,” “?,” or the like. 
Researchers often seem to be uncertain them- 
selves about just what such a response means, 
and even whether it actually is representative 
of the respondents’ judgments. 

On the basis of a survey of union members 
(2), this paper is an attempt to provide some 
answers to the question of whether checks in 
the middle category can be considered as 
valid evidence that individuals really have not 
made definite judgments. The criterion of 
validity used in this study was evidence of 
neutrality. 

A little background of the survey may be 
useful in providing a setting. The study was 
made in a midwestern, regional union with 
25,000 members. It was conducted by means 
of a questionnaire mailed to a random sample 
of members and supplemented by interviews 
with a sample of nonrespondents. The ques- 
tionnaire dealt with opinions about collective 
bargaining procedures, grievance handling, the 
role of the business agent, union meetings, 
and union political action. 

The opinion items, by and large, were 
three-part statements, the first on norms (the 
“shoulds” and “should nots” of a given social 
situation), the second on perceptions (what 
is subjectively experienced as existing in a 
given social situation), and the third on 
evaluations (the judgment of a given situa- 
tion, in terms of the degree to which norms 
are perceived as being met). The norm and 
perception statements had the response cate- 
gories: always, usually, sometimes, seldom, 
and never, and, in addition, the perception 
statements had a category of don’t know. 
The evaluation statements had the response 
alternatives: strongly agree, agree, undecided, 
disagree, and strongly disagree. The rationale 


behind these response alternatives was pre- 
sented in an earlier article (1). 

One of the major concerns, in analysis, was 
to discover the nature of relationships among 
the three parts of each question. Much of 
the analysis of relationships was done in 
terms of three arbitrarily defined evalua- 
tion groups: satisfied (those who answered 
strongly agree or agree), undecided, and dis- 
satisfied (those who answered strongly dis- 
agree or disagree). 

One of the most illuminating steps in the 
analysis, in terms of this paper, was a very 
simple one: it included a determination of 
the percentage of each evaluation group that 
had answered “don’t know” to the perception 
part of each question, as well as the computa- 
tion of the average * don’t know response for 
each evaluation group on all opinion ques- 
tions. In all cases, the undecided group had 
a significantly * larger proportion of don’t 
know responses on each item than did either 
the satisfied or dissatisfied group. The aver- 
age percentages of don’t knows for all items 
were 37 for the undecided group, 5 for the 
satisfied group, and 7 for the dissatisfied 
group. 

It made logical sense that members who 
did not know what was being done might well 
be undecided, rather than satisfied or dissatis- 
fied, in their judgments. In terms of this 
logic, the findings could be interpreted as 
meaning that a considerable segment of those 
answering “undecided” were neutral; i.e., 
they really had not made up their minds 


1 The percentages of don’t know responses for each 
evaluation group on all opinion items were averaged. 
Because the percentages were based on a constant N 
from item to item, each percentage carried an equiva- 
lent weight and therefore could be averaged without 
bias. 

2 The proportions of don’t know responses of the 
undecided group were sufficiently large in compari- 
son with those of the satisfied group and of the dis- 
satisfied group to justify an assumption of signifi- 
cance without actual statistical computations. 
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Fic. 1. Schematic organization of data in prepara- 
tion for mean norm-perception deviation indices. 


about how they felt, because they did not 
think they had the information with which 
to form a judgment.* 

Such an interpretation, however, was pos- 
sible for only part of the undecided group. 
What of the members who had not answered 
“don’t know” to the perception part of the 
item? Analysis of that segment of the un- 
decided group, as well as of the satisfied and 
dissatisfied groups, was done in terms of rela- 
tive homogeneity (similarity in response be- 
tween norms and perceptions). Homogeneity 
was measured in terms of mean norm-percep- 
tion deviation indices for each of the evalua- 
tion groups. Schematically the data were or- 
ganized for the analysis as shown in Fig. 1. 

In determining the mean deviation index, 
an attempt was made to characterize each 
evaluation group in terms of its deviation 
(regardless of direction) from complete norm- 
perception comparability, i.e., from the cases 
falling in the boxes that the O diagonal 
passes through (see Fig. 1). An assumption 
of equal-appearing intervals (see diagonal 
weightings of 1, 2, 3, and 4 in Fig. 1) had to 


3 A further step was taken to gain insight into that 
segment of the undecided group that answered “don’t 
know.” Those members answering “don’t know” 
were compared to those indicating definite percep- 
tions by means of x* on five personal characteristics: 
age, education, length of union membership, book 
classification, and pay. The major finding was that, 
in all but one item, the shorter the time members 
had been in the union, the less likely they were to 
think they knew what was going on. This made 
sense, since one commonly assumes that experience 
in a situation tends to increase an individual’s feel- 
ing of knowledge about the situation. 
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be made in the derivation of the mean devia- 
tion index because nonparametric statistics 
did not have the capacity to provide the 
analysis that was sought, i.e., varying degrees 
of norm-perception comparability. The mean 
deviation index was derived by using the fol- 


o ‘X 
lowing formula: X = nl 


, where X is equal 


to the weight attributed to the cases falling 
along the given diagonals; f the number of 
cases covered by the given diagonals; and NV 
the total number of cases considered. A 
measure of variability of the index was de- 
rived by using the following formula: 


[fX? (fx)? 


¢=4/— — ee 
VA \ 
Significant differences in relative homogeneity 
of the three evaluation groups were derived 
by using the following formula: 


Ai— Ag 


lox? os 
Vv,7N, 


It had been reasoned that the satisfied 
group would be relatively homogeneous, that 
satisfaction would tend to result from seeing 
done what one thought should be done. On 
the other hand, it had been reasoned that the 
dissatisfied group would be relatively hetero- 
geneous, dissatisfaction tending to spring from 
seeing either more or less being done than 
one thought should be the case (1). For the 
undecided group, it had been reasoned that 
there would be significantly less homogeneity 
than was true of the satisfied group and sig- 
nificantly more homogeneity than was true of 
the dissatisfied group. The logic behind such 
a prediction was that neutrality would result 
from the lack of a sufficiently strong stimulus, 


_ arising from the norm-perception relationship, 


to bring about a definite judgment. If un- 
decided responses followed this pattern, one 
could conclude that they were valid. 

In the analysis of significant differences in 
relative homogeneity between the evaluation 
groups, the satisfied group was found to be 
significantly more homogeneous than the dis- 
satisfied group in 25 of the 27 items, but it 
was significantly more homogeneous than the 
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undecided group on only four of the items. 
The undecided group, however, was signifi- 
cantly more homogeneous than the dissatis- 
fied group on nine items. In none of the 
items was the undecided group significantly 
different from both the satisfied and dissatis- 
fied groups, and in 14 items the undecided 
group was not significantly different from 
either the dissatisfied or satisfied groups (see 
Table 1). 

These findings, for the segment of the un- 
decided group that indicated definite percep- 
tions, cast considerable doubt on the validity 
of middle category responses, since in no case 
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was the undecided group both significantly 
less homogeneous than the satisfied group and 
significantly more homogeneous than the dis- 
satisfied group. It must be remembered, how- 
ever, that this conclusion springs from the 
reasoning presented earlier. 

Although the data presented above do not 
provide any basis for interpreting the signifi- 
cant differences that were found to exist be- 
tween the undecided group and either of the 
other evaluation groups, it may be interest- 
ing to speculate on the meaning of these dif- 
ferences. One possibility could be that the 
undecided members were camouflaging their 


Table 1 


Significant Differences in Homogeneity (Mean Norm-Perception Deviation) Between Evaluation Groups 
Including Designations of Means and Standard Deviations 





Xp 
In Terms of Units 
of Deviation 


Xp A 
In Terms of Units 
of Deviation 





Significant Difference 
Between X p’s 





Dis. Sat. 


Und. 


Uvs.St Uvs. Dt 


Svs. Dt 





aa : 1.51 793 
1.96 987 

59 a 1.93 .933 
52 : 2.12 829 
1.26 913 

50 j 1.64 823 
37 83 1.60 698 
. 1.25 872 

53 ; 2.21 .799 
.60 : 1.68 878 
2.11 829 

50 ; 1.73 734 
.29 : 1.22 651 
44 e. 1.83 .680 
42 : 2.33 765 
A7 a 1.85 880 
89 i 2.15 1.060 
76 . 2.25 .906 
53 m2 1.66 818 
.62 t 2.17 1.006 
54 : 2.39 825 
49 76 1.99 896 
42 72 1.52 745 
71 1.06 2.05 1.194 
69 80 1.92 1.142 
53 1.23 2.30 .933 
59 1.42 2.03 764 


1.311 
1.568 
2.072 
1.013 
1.082 

941 
1.048 
1.168 

829 


1.66 .66 
76 1.63 
1.79 1.20 
2:35" 1.68 
Be 87 
1.13 1.57 
1.02 2.03* 
A7 1.10 
1.98* 1.66 
1.066 1.42 1.17 
1.254 1.67 1.86 
889 , 1.59 1.42 
.997 .62:! 1.26 83 
891 2.46* 1.26 

.918 1.95 2.60** 
1.719 1.53 .69 
1.094 : .93 2.02* 
1.048 1.65 2.07* 
1.026 1.63 1.11 
1.016 1.12 235° 
1.290 ‘ 88 2.53" 
1.079 j 64 2.28* 

891 8: 88 1.45 
1.403 . 71 1.94 
1.229 i 24 2.1%" 
1.150 1.63 2.14* 

882 : 2.24* 1.45 


2.38" 
3.02** 
3a0°° 
4.32** 
1.68 
rw i eg 
3.00** 
1.50 
4.31** 
yt hag 
3.67** 
3.08** 
2.02* 
3.66** 
4.90** 
2.30* 
3.32°° 
4.38** 
2.76** 
aon 
3.70" 
3:13°° 
2.08* 
t FY su 
251° 
397°" 
4.00** 


tn na ty in be 


yw & iP 


ee 
, 

~ 

= 

= 





+ Less homogeneity (greater mean norm-perception deviation) in first-mentioned group on all items. 
t More homogeneity (less mean norm-perception deviation) in first-mentioned group on all iiems. 


* Significant at the .05 level. 
** Significant at the .01 level. 





“Undecided” Answers in Questionnaire Responses 


“true” judgments. For instance, in the four 
cases where they were significantly different 
from the satisfied group and indistinguishable 
from the dissatisfied group, they might have 
been afraid to admit dissatisfaction. In the 
nine cases where they were significantly dif- 
ferent from the dissatisfied group but indis- 
tinguishable from the satisfied group, it seems 
less logical to posit fear as a basis. One 
could hypothesize, however, that they might 
have been dissatisfied with the union as a 
whole, satisfied with the particular item but 
unwilling to give credit for it, or that their 
feéling was sufficiently mild that they were 
reluctant to give any impression of intensity. 
One could further speculate that, if a situa- 
tion arose where a choice had to be made, 
those answering “undecided” would tend to 
move toward the evaluation group to which 
their norm-perception relations were com- 
parable. But, of course, the data provide no 
real evidence relative to these points. 

Two segments of the undecided group have 
been considered. What conclusions can be 
drawn for the group as a whole? The two 


segments present considerably different pic- 
tures with respect to the validity of their re- 


sponses. The “undecided” judgments of mem- 
bers who said they did not know what was 
being done on any point would appear to be 
valid—in terms of both (a) the fact that 
don’t know responses were significantly less 
prevalent in the other evaluation groups, and 
(6) a logical basis for failing to make a defi- 
nite judgment (thinking they had inadequate 
information). There is no evidence for the 
validity of the remainder of the undecided re- 
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sponses, however, and, consequently, it is im- 
possible to draw conclusions for the group as 
a whole. 

If these findings are typical, the question 
then arises whether or not an undecided 
category is a useful one. In terms of the 
standard evaluation type questionnaire, inter- 
pretation of such a category may well be mis- 
leading, and the researcher may be on safer 
ground to omit it altogether. In terms of a 
tripartite question design, such as was used 
in this study, it also may be well to omit 
the undecided category, even though greater 
analysis of it is possible than is true with the 
standard questionnaire design. It may be 
hypothesized that without such a category 
the members who are undecided because of 
lack of information would leave the evalua- 
tion part of the question blank; whereas the 
remainder might be lured into committing 
themselves, especially if the response cate- 
gories were made less extreme, e.g., by prefac- 
ing the agree and disagree categories with 
“moderately” or some other limiting adjec- 
tive. Blanks could be analyzed in the same 
way as undecided responses were in this pa- 
per, and would be equally useful, but perhaps 
less likely to occur. 
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Experimenters in the field of attitude meas- 
urement who have an attitude questionnaire 
composed of two or more subscales face the 
same perplexing problems. Should attitude 
items in each subscale be grouped on the 
questionnaire or should these items be scat- 
tered at random? Obviously, the advantages 
in scoring when items are grouped are large. 
On the other hand, advocates of randomized 
ordering of items would argue that such group- 
ing might result in different answers by the 
respondent. 

Work done on this question in terms of 
attitude scaling is limited. The evidence from 
two separate studies by Baehr (1) and Metz- 
ner and Mann (2) indicates that grouping has 
little effect on responses. In both studies, the 
grouped items were labeled by an appropriate 
category heading which would tend to make 
grouping stand out and increase the chance of 
differential responses. Metzner and Mann 
presented indeterminate results and concluded 
that no definite answer could be given to the 
question; Baehr obtained almost identical 
morale profiles from respondents and con- 
cluded that grouping had no effect. 


University of Minnesota 


Our study presents further evidence on 
the effect or noneffect of grouping items in 
logical categories on an attitude questionnaire. 
This study itself was done as part of a more 
extensive research project being conducted on 
union members’ attitudes and feelings toward 
their union by the Industrial Relations Center 
at the University of Minnesota." 


Method 


An attitude questionnaire composed of 77 items 
was presented to two separate union groups. In 
each case, half of the questionnaires had items placed 
at random; half had items grouped into the follow- 
ing seven categories: Unionism in General, Local 
Union, Local Policies and Practices, Local Officers, 
Local Union Administration, National, and Diag- 
nostic. None of these categories was labeled on the 
questionnaire. Questionnaires were given at a union 
meeting to Union I, a clothing-manufacturing local 
composed of female members. Union II, an indus- 
trial local, all male, received questionnaires by mail. 


Results and Discussion 
Tables 1 and 2 show the mean scores re- 
ceived on six subscales (diagnostic items are 


1Funds for this research were made available 
through a grant from the Graduate School, Univer- 
sity of Minnesota. 


Table 1 


Mean Differences Between Grouped Items and Mixed Items on Six Union-Attitude Subscales with 
Significance Test Results for Union Group I 


Union Group I 


Local Local 


General Union 


Item 


Union-Attitude Subscale 


Pol. & Pract. 


Local 
Union Admin. 


Local 


Officers National 





Grouping N Mean SD N Mean SD 


Grouped 
Mixed 


15.35 
7.42 


17 19.53 3.65 


21 19.57 1.98 21 


19.55 2.86 37 


Total 


35 


54.49 11.80 ; 


N Mean SD 


16 21.13 6.21 
23.33 2.51 


22.38 4.62 37 31.65 5.54 38 22.58 3.73 32 19.63 4.20 


N Mean SD 


N Mean SD 


N Mean SD 
16 30.13 7.12 
21 32.81 3.50 


17 21.65 4.79 15 18.87 5.50 
21 23.33 2.33 17 20.29 2.45 





t value for 


mean difference 1.35 04 


1.44 


1.46 1.35 
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Grouping Scale Items and Union-Attitude Measurement 


Table 2 


Mean Differences Between Grouped Items and Mixed Items on Six Union-Attitude Subscales with 
Significance Test Results for Union Group II 





Union Group II 





Union-Attitude Subscale 





Local 
Union Admin. 


N Mean S 


Local 
Officers 


N Mean SD 


Local 
Pol. & Pract. 


Local 
General Union National 
Item —- 


Grouping N Mean SD 





N Mean SD N Mean SD 


33 58.09 10.42 34 20.18 3.88 
38 55.58 14.20 36 18.83 4.79 


N Mean SD 


34 21.44 4.00 
36 20.72 5.48 


70 19.49 4.41 70 21.07 


33 21.45 
36 21.11 


29 19.69 5.32 
36 19.97 6.06 


5.11 
4.76 


34 33.85 7.16 
37 32.38 6.70 


Grouped 
Mixed 
Total 71 12.63 


56.75 


71 33.08 6.98 


69 21.28 4.90 65 19.85 5.72 


t value for 


mean difference .85 1.30 


* Significant at the .05 level. 


scored separately, not as a scale) for both 
union groups. 

Significance tests indicate that only one of 
the twelve mean differences between mixed 
and grouped samples is statistically significant 
—that of Officers, in Union Group II. There 
is no pattern of mean differences. 

In addition, Union Group I has higher 
mean scores on every subscale for the mixed 
questionnaires, while Union Group II has 
higher mean scores on five of six subscales on 
the grouped questionnaires. These opposite 
results tend to support a hypothesis that 
grouping of items has no consistent effect on 
mean scores. 

One possible explanation for the opposite 
mean scores in the two samples could be a 
sex difference. Union Group I was made up 
of female members; Union Group II was all 
male. If this were the case, grouping of items 
still might have a consistent effect to push 
mean scores up or down. However, evidence 
from unpublished past developmental work 
on the union-attitude scale with 729 males 
and 92 female union members indicates that 
such a sex difference does not exist. Very 
small mean differences on subscales were the 
rule. 

From these two samples, the basic conclu- 


2.37° d .20 


sion is that grouping of items on an attitude 
scale has no consistent effect on results. 
Ordering of items evidently can be based on 
the preference of the experimenter. No evi- 
dence has been shown about the effect of the 
grouping of attitude items on results. 


Summary 


A 77 item union-attitude scale was given 
to two different union groups. Half of each 
group received a questionnaire with items 
grouped by subscale; half of each group re- 
ceived a questionnaire with ungrouped items. 
Only one of twelve mean differences on six 
subscales between grouped and mixed was 
significant. On over-all means, the grouped 
sample was high in one union group, low in 
the other. No changes in results from group- 
ing of attitude items were shown. 


Received June 15, 1954. 
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Super points out that “Counselors working 
with clients in schools, colleges and guidance 
centers frequently comment on the large num- 
ber of cases in which Strong’s Blank merely 
confirms what one already knew from inter- 
viewing the client and, what is more, what 
the client already knew himself” (9, p. 429). 
If, in fact, the long-term validity of expressed 
interests matches that of the interest inven- 
tory, the practical psychologist may wish to 
save himself the time and cost of interest 
testing. On the other hand, if these two 
methods of learning the client’s interests have 
unequal validity, or have equal validity, but 
each is especially effective for one subtype of 
client, the practical conclusions to be drawn 
also have theoretical implications. 

The only follow-up study explicitly com- 
paring the predictive powers of expressed and 
inventoried interests is that of Wightwick 
(12). -The Strong inventory, administered 
during the freshman year, predicted postcol- 
lege occupations slightly better than did the 
interests expressed during the freshman year. 
By the senior year, however, Wightwick’s 
subjects were able to express more realistic 
choices, which predicted slightly better than 
the Strong the postcollege career actually en- 
tered. This finding is consistent with studies 
(1) that point out the increasing stability and 
realism of expressed career choice as the per- 
son passes through adolescence into maturity. 

This paper offers more information and a 
longer follow-up on the validity of expressed 
interests, directly compared to the predic- 
tive power of the Strong Vocational Interest 
Blank. The circumstances favoring accuracy 
in one or the other instrument will be ex- 
amined and theoretical conclusions about the 
“interest” approach to vocational guidance 
will be drawn. 


1This paper was written as part of the Study of 
Adult Development (formerly the Grant Study) con- 
ducted by the Department of Hygiene. 


Validity of Expressed Interests 


In the academic year 1939-1940, 63 sopho- 
mores in Harvard College were taken into the 
Study of Adult Development. This study 
had been organized for the purpose of observ- 
ing “normal” young men during college and 
the early decades of adult life. Its purposes 
and initial findings have been described by 
Heath (2). 

The 63 men were given a battery of tests 
that included the Strong Vocational Interest 
Blank. Each man was also interviewed on 
the basis of a “Social History’ form that in- 
cluded a question about career plans. He 
also spent several hours in psychiatric inter- 
views, one or more of which touched upon his 
ideas about his future career. 

Of the 63 men, two died in World War II, 
before embarking upon any civilian career. 
One expressed no interests at all. The re- 
maining 60 both expressed interests and are 
known to have been pursuing certain careers 
in the year 1953, when this follow-up was 
made. Our findings will, therefore, be based 
on an N of 60 men. 

From the interviews we formed a list of the 
“expressed” interests of each man. Any in- 
terest mentioned was included. For instance, 
if a man was said to have “thought about 
going into medicine but doesn’t know if he is 
suited for it,” the fact that he had thought of 
that possibility for himself was sufficient to 
make us record his expression of interest. 
It not infrequently happened that the career 
thus fleetingly thought of during his early col- 
lege years was the one that he eventually 
pursued. 

A “strong” interest was especially noted if 
it was one explicitly mentioned as being very 
important to the subject (S); e.g., “the man’s 
whole drive and ambition has been to go to 
medical school.” Occasionally, men expressed 
“wistful” interests that at the time seemed 
to them to be more daydreaming than career 
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Case 1953 
No. Occupation 


Doctors 


7. Surgeon 


Pediatrics, 
Teaching, 
Research 


Lawyers 
16. 


Lawyer 


18. 
21. 


Lawyer 


Lawyer 


Public admin- 
istrators 


20. Elected 
official 


Scientists 


34. Chemist 


Chemist, 
Teaching 


+ Source: Direct questions about future plans when S entered Grant Study in 1939 and subsequent psychiatric interviews. 
* (+) indicates strong interest, 


planning. 


now a physician. 
“negative interests.” 


anything else to do. 


Expressed and Inventoried Interests 


Table 1 


Prediction of Career Choice by Expressed Interest tf 


Expressed 
Interests* 


Medicine 
Industrial chem. 
Industrial mgmt. 


Medicine (?) 
Medical research 
Hotel mgmt. (— ) 
Aero mechanics 
Plant biology 


Lawyer 

Exec. scout master 
Govt. admin. 
Engineering 


Lawyer (+) 


Teacher 
College dean 


Govt. official (+) 
Lawyer 
Business 


Physician (+) 
English teacher 
Chemistry 
Research chem. 
Teaching chem. 


(?) = wistful interest, and (—) 


For example, it was written about 
one man that “At one time he intended going 
into medicine and is just beginning to get 
over this. He states that he has no particular 
feeling for medicine but he considers it 
rather romantic.” It happened that this man 
did enter medical school after college and is 
Some men also expressed 
These men said they 
really did not want to enter an occupation but 
felt that they probably would, because of 
family pressure or their own failure to find 
Such is the case where 
“the boy says that he does not want to go 


(Selected Cases Only) 


: = 
1953 
Occupation 


Case 
No. 


Predictive 
Accuracy 


| Office men 
| 


Good Hit 48. Accountant 


Poor Hit 
Own business 


$5. Trustee 


Poor Hit 


Good Hit 


Clean Miss | 


Ministers 


Good Hit 56. Minister 


Teachers of 
letters 


Clean Miss | 


Good Hit | 61. Secondary 


teacher 


is shown here.’ 


= negative interest. 


Expressed 
Interests* 


Business (— ) 


Accountant 


Engineer (?) 


Aviator (?) 


Diplomatic service 
Business 

Politics 
Engineering 
Medicine 


City manager 


Administrator 

Managing a national 
park 

Teaching 

Government work 


Medicine (+) 
Biologist 
Science teacher 
Minister 


Prep school teacher 
(+) 


Medicine (?) 


Predictive 
Accuracy 


Poor Hit 


Clean Miss 


Poor Hit 


Good Hit 


into law and is doing so merely because the 
father expects one of his sons to and the boy 
says that it seems that he has been elected.” 

Table 1 contains, besides a list of all the 
interests expressed by the 60 men in 1939, a 
list of their 1953 occupations and a classifica- 
tion of the predictive accuracy of the interests 
expressed in the interviews. 
signed to include various occupations and 
various classifications of predictive accuracy 
Classification was made on 


A sample de- 


2The entire table has been deposited with the 
American Documentation Institute. 
No. 4512 from the ADI Auxiliary Publications Proj- 
ect, Photoduplication Service, Library of Congress, 


Order Document 
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Table 2 


Validities of Expressed and Inventoried Interests: 
1939-1953 





Inventoried 
Interests 


Expressed 


Validity Interests 





Good Hit 31 27 
Poor Hit 14 12 
Clean Miss 15 21 








Total 60 60 


the following basis: If a man was 14 years 
later pursuing an occupation in which he had 
expressed strong interest, or which had been 
one of three or fewer interests of any sort 
that he expressed, he was classified as a “Good 
Hit.” “Negative” interests were regarded as 
strong ones for this purpose; the man had 
been ego-involved, if acrimonious. If he was 
in 1953 pursuing an interest that had been 
one of four or more, and cited casually, or 
wistfully, the man was rated a “Poor Hit’; 
if his 1953 occupation had not been men- 
tioned at all, he became a “Clean Miss.” 
These definitions were set up to be exactly 
analogous to the standards used by Strong 
(8) in discussing the validity of his inventory, 
and to the follow-up of the Strong inventories 
of this same group discussed in an earlier 
paper (5). 

To facilitate comparison, the accuracy of 
the interests expressed by our group of Ss 
may be seen in Table 2, side by side with the 
accuracy of their Strongs. In the follow-up 
of expressed interests, just over one-half the 
men became Good Hits, one-quarter scored 
Poor Hits, and the remainder scored Clean 
Misses. In general, simply asking a college 
sophomore to name his future career possi- 
bilities may be a useful guidance procedure! 
These expressed interests even turn out to be 
slightly (but significantly) more valid than 
the same men’s inventoried interest scores. 
On the basis of our sample we may regard the 
validation of expressed and inventoried in- 
terests as equal. 

Strong (8) has introduced another way of 
validating an interest test: men entering an 
occupation should have higher interest in that 





Washington 25, D. C., remitting in advance $1.25 
for 35 mm. microfilm or $1.25 for 6 X 8 in. photo- 
copies. Make checks payable to Chief, Photodupli- 
cation Service, Library of Congress. 
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occupation than has been demonstrated by 
men in general. Strong found this to be true 
of his test, and McArthur (5) has shown that 
this proposition applied to the Strong tests of 
the 60 men whose expressed interests are now 
being examined. Table 3 shows that this 
test of validity is passed by the expressed 
interests, as well. 

A warning about the reading of this table 
is in order. The implication cannot be drawn 
that a man who expresses a sophomore in- 
terest in becoming a physician will neces- 
sarily become one. In our group, the proba- 
bility of his doing so was about even. The 
odds on a would-be lawyer actually entering 
that profession were slightly less than even, 
while would-be public administrators, en- 
gineers, and chemists realized their plans in 
about one case out of four. Two-thirds of 
those considering the ministry actually en- 
tered the ministry in our short series of 
three men. 

A third check that Strong has applied to 
the validity of his test was to show that it 
gave interest scores higher for men who re- 
mained in an occupation than for men who 
entered that occupation but subsequently de- 
serted it for another. For the 60 men we are 
discussing, Strong’s proposition was demon- 
strated (5) to be true. It is less true of 
expressed interests. Of a total of 36 instances 
of change from occupation A to occupation 


Table 3 


Strong’s Second Proposition Applied to 
Expressed Interests 


Percentage 
of Men 
Engaged in 
Occupation 
Expressing 
Interest 


Percentage of 
Men not 
Engaged in 
Occupation 
Expressing 
Interest 


Occupation 


Physician 100 27 
(N = 12) 

Lawyer 73 23 
(N = 11) 

Public admin. 60 22 
(N = 5) 

Engineer 75 
(N = 4) 

Chemist 
(N = 3) 

Minister 
(N = 2) 








Expressed and Inventoried Interests 


B, there were seven in which the expressed 
interest had been in occupation A and nine 
in which the expressed interest had been in 
occupation B. Of the remainder, 11 men had 
expressed interest in both and 9 had expressed 
no interest in either! By the criterion of 
predicting later-life changes in occupation, the 
expressed interests do not appear to be as 
valid as the Strong. 

A further test of the validity of occupa- 
tional interests is their ability to predict 
future job strains and contentment. Of 13 
men expressing discontent with their jobs in 
1953, ten had entered occupations (5) for 
which their test score was less than A. An 
apparently identical finding about expressed 
interests is that, of these same 13 discon- 
tented men, 10 are in occupations for which 
they did not express a “strong’’ interest. 

Similarly, of the eight men reporting (as far 
as the 1953 questionnaires had then been re- 
ceived) physical symptoms of strain resulting 
from their jobs, five were in jobs which were 
considered Poor Hits or Clean Misses for the 
Strong Vocational Interest Blank, and five 
were also in jobs classified as Poor Hits or 
Clean Misses for their expressed interests. 

In general, the interests expressed by these 
men in their sophomore year turned out to 
be good predictors of later career choice and 
adjustment. Comparisons between the validity 
of expressed and inventoried interests seldom 
favored one or the other. We are left with 
the hypothesis that expressed and inventoried 
interests are not significantly different in 
validity. We must now ask whether they are 
interchangeable. 


Public and Private School Results 


In the previous paper it was demonstrated 
that the Strong blank was much more valid 
for boys who prepared for Harvard in public 
high schools than it was for boys coming to 
Harvard from private schools. No such trend 
appears with respect to expressed interests. 
The figures appear in Table 4. A slight 
trend in the opposite direction appears: more 
public school boys express interests that are 
Clean Misses. (For the whole table, p is 
about .30.) 

Out of the 60 cases, there are 18 in which 
the expressed interests are better predictors, 
13 in which the inventory was better, and 29 
instances in which the two are equivalent. 


Table 4 
Validity of Interests Expressed by_Public and 


Private School Boys, 


Public 
School 


Private 
Validity School Total 


Good Hit 17 14 31 
Poor Hit ; 6 14 
Clean Miss 2 10 15 


Total c 30 60 


(In two-thirds of the cases of equivalence, 
both tools were accurate.) There is a clear 
relation between the superiority of one or the 
other measure of interest and the social origin 
of the subject. Using again the public-private 
school dichotomy as an index of status, we 
obtain the figures in Table 5. Where it had 
been previously demonstrated that the Strong 
was particularly valid when applied to public 
school boys, it now appears that expressed 
interests are particularly valid when one is 
working with private school boys. This de- 
spite the fact, also visible in Table 5, that 
the Strong does, indeed, “in a large number of 
cases . . . merely confirm what one already 
knew from interviewing the client.” 

By the rule of using the Strong to predict 
public school boys’ careers and expressed in- 
terests to predict private school boys’ careers, 
one makes 37 Good Hits, 11 Poor Hits, and 
12 Clean Misses. The improvement is not so 
great as might have been expected. 

What are the general implications of these 
findings? The hypothesis has been advanced 


Table 5 


Relative Validities of Expressed and Inventoried 
Interests Among Private and Public 
School Boys 


Inven- 
toried 
Interests 
More 
Accurate 


Expressed 
and In- 
ventoried 
Equally 
Accurate 


Expressed 
Interests 
More 
Accurate 


Social 
Origin 


Total 


Private 
school 
Public 


school 


Total 
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Table 6 


Relative Validities of Expressed and Inventoried 
Interests for Predicting Ambitious and 
Responsive Careers 





Inven- 
toried 
Interests 
More 
Accurate 


Expressed 
and In- 
ventoried 
Equally 
Accurate 


Expressed 
Interests 
More 
Accurate 


‘Type of 
Career 





Responsive 
Careers 
Ambitious 
Careers 8 


Total 18 29 13 


Note. 


Chi square = 5.3, df = 2, p = approx. .07. 


(3) and documented elsewhere (4) that con- 
trasting career mores are held by public and 
private school boys at Harvard, who represent 
different sections of the middle and upper 
classes. In the hope of showing why these 
social class mores affect the problem of career 
prediction, the senior author classified all 60 
histories into Miller and Form’s (6) cate- 
gories of “Ambitious Careers” and “Respon- 
sive Careers.” Miller and Form define these 
terms as follows: Ambitious Careers: Feeling 
of hope and confidence that higher occupa- 
tional goals can be attained: Responsive 
Careers: Feeling of acceptance with job 
progression which parents or relatives expect 
worker to follow. 

In our data Ambitious Careers carried 
working class sons into a lucrative position or 
the sons of nouveaux riches into desired pro- 
fessional respectability. Responsive Careers 
saw a son following his father’s vocation or an 
acceptable equivalent of it (as was the case in 
the family whose generations alternated be- 
tween medicine and law), or else entering the 
family business. If we tabulate the relation 
between type of career and relative validity 
of expressed and inventoried interests, we 
have Table 6. 

The relation shown in Table 6 is partly, of 
course, an artifact attributable to the relation 
between Responsive Careers and private school 
attendance, but not entirely so, since in- 
stances of public school boys with Responsive 
Careers and private school boys with Am- 
bitious Careers did occur. Even if Table 6 is 
regarded merely as a restatement of Table 5, 
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it has the advantage of calling our attention 
to this generalizable hypothesis: Men who 
can foretell what their career will be with 
more accuracy than the test can foretell it 
are men whose career choices have been re- 
stricted to those regarded as “fitting” by their 
parents. 

This is, in effect, the conclusion previously 
reached (5) from an examination of the 
Strong. There it was suggested that the 
Strong did, as claimed, measure interests and 
that the test only broke down among social 
groups who did not permit their interests to 
determine their job choices. We now have 
some evidence that people to whom the 
Strong does not apply are, indeed, responding 
to pressures other than their own interests. 
At the same time, we have seen that these 
people very often follow out the prediction of 
their expressed choices. Should we therefore 
conclude that the inventory measures interests 
while answers to the question, “What careers 
are you interested in?” do not measure in- 
terest at all? 

The question plunges us into a semantic 
dilemma. Must we demand that the man 
who follows a Responsive Career has intro- 
jected the goals supplied him by his sub- 
culture before we speak of his possessing an 
“interest” in these goals? Or are expressed 
interests, among these men, more like prophe- 
cies about themselves, and scarcely to be re- 
garded as interests at all? The four negative 
interests listed in Table 1 are certainly of 
that sort; all four were expressed by men 
who attended private school. In fact, while 
both groups expressed about the same number 
of casual interests, Table 7 shows the sharp 
decrease in the proportion of private school 


Table 7 


Interests of Varying Warmth Expressed by Private 
and Public School Boys 





Strong Wistful Negative 
Interests Interests Interests 
Expressed Expressed Expressed 


Social 
Origin 





Private 
school 

Public 
school 
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boys as the tone of the expressed feeling grows 
warmer. Part of our findings may be ex- 
plained on the assumption that the Strong is 
most appropriate where a career interest has 
been introjected and become a cardinal por- 
tion of the person’s ego. Certainly, the men 
following Responsive Careers do not “pursue 
an interest” with the same intensity or sense 
of progressively expanding horizons that must 
be felt by a garageman’s son who comes to 
Harvard wanting to be a teacher because that 
is the only intellectual job he knows, only to 
discover there that one can be paid to do 
research and that there is a job title “physi- 
cist,” which leads into a graduate school 
specialty in mathematical theories of motion 
that, in turn, opens up the new possibility of 
a career in astronomy. A number of such 
experiences have been described by Roe (7) 
and her biographies underline the resulting 
feelings of discovery and delight. For such 


men, an inventory measuring values and 
mores may well predict career possibilities 
before the man himself envisages them. 
Perhaps, then, the Strong is most useful 
where the subject’s knowledge of his future 
vocation is least adequate. The private school 


boy who wants to be a doctor or lawyer 
usually has a doctor or lawyer in his family, 
or numbered among his social contacts. His 
image of his future role has gained tangibility 
and has been enriched with factual detail. 
The public school boy who wants to be a 
psychologist will often have to explain to 
surprised parents the nature of the job that 
they know nothing of and that was, until 
recently, entirely unknown to him. That the 
private school boy’s prophecies should come 
true more often is probably in part explicable 
through just such a difference in available 
information. Such accurate prophecies may 
have little in them that can be properly termed 
“interest.”” One striking illustration of this 
trend to accurately predict career “interests” 
that offer little personal gratification is the 
fact that eight out of the eleven men in our 
group who attended the best New England 
private schools, and who were undeniably 
members of the upper class and undoubtedly 
knew the behavior of doctors, lawyers, and 
trustees from first-hand experience, predicted 
their careers with perfect accuracy—but five 
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of these eight now wish to change occupa- 
tions. 


Conclusion 


Apparently the Strong (and perhaps the 
whole “square-peg” approach to guidance) is 
most applicable to men reared in the middle 
class success culture. The Strong seems less 
applicable for those upper-middle and upper 
class groups who possess an alternative cul- 
ture. Among this group, expressed interests 
are more accurate predictors than the test 
scores. 

The inventory predicts Ambitious Careers 
while the expressed interests predict Respon- 
sive Careers. The inventory apparently meas- 
ures values or mores that can become the 
basis for unforeseen choices; expressed “in- 
terests” apparently are more in the nature of 
attempts at prophecy and so depend for their 
validity upon the amount of realistic in- 
formation available to the subject. 

The two seem to be of equal validity, but 
because each is useful with a particular type 
of person, they are not interchangeable. 


Received June 21, 1954. 
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Use of the MMPI with Student Nurses 


Irwin Mahler 


Drake University 


In the past few years there has been much 
hope expressed that it will soon be possible 
to add measurements of personality to other 
screening devices for choosing candidates for 
certain professions. The Minnesota Multi- 
phasic Personality Inventory (MMPI) has 
been used by many investigators in the hope 
of discovering differential personality patterns 
or profiles among various professions. Lough 
(2, 3) has reported twice on differences be- 
tween student nurses and other college girls, 
and these reports have supported popular 
opinion that nurses are a better adjusted, 
more stable group. Beaver (1) has recently 
reported the isolation of 66 items of the 
MMPI which differentiate nursing students 
from a matched group of female education 
majors in a university. Weisgerber (5) has 
presented a new table of norms for the MMPI 
which he finds to be more appropriate for 
student nurses. 

As a result of teaching student nurses for 
four years the opinion of the writer is that 
these girls are indistinguishable from any 
other group of girls recently graduated from 
high school. Many do come from rural, and 
many from more fundamentally religious en- 
vironments, but no extra stability or more 
masculine interests have been noted. For 
these reasons, attempts to demonstrate spe- 
cific personality patterns for student nurses 
have been regarded skeptically. The in- 
vestigation reported by Beaver, therefore, has 
been of special interest, because of the claim 
that a subscale had been isolated that would 
differentiate nursing students from other stu- 
dents. This study may be considered to be 
an attempt at validating the results reported 
by Beaver (1) and Weisgerber (5). 


Method 


Subjects. The complete freshman class of student 
nurses, 81 in number, at the Iowa Methodist Hos- 
pital Nursing School of Des Moines made up the 
experimental group. The comparison group was 


composed of volunteers from compulsory physical 
education classes, required for all freshman women, 
at Drake University. The volunteers included 16 
girls from the College of Liberal Arts, 14 from the 
Fine Arts College, 10 from the College of Business © 
Administration, and 10 from the College of Educa- 
tion. Since only 50 volunteers were obtained it was 
impossible to match the two groups individually, 
but they were matched on mean age and mean ACE 
raw scores, and on the basis of being freshmen. 
The mean age of both groups was 18.4 years. The 
mean ACE raw score and SD for the nurses group 
were 114.9 and 16.95. The comparable scores for 
the Drake group were 108.9 and 23.88. There were 
no significant differences between these groups on 
the tested variables. 

Procedure. The ACE scores were available for all 
subjects because the test had been administered as 
part of the entrance requirements of both groups. 
The group MMPI was administered to all subjects 
during the second semester of their freshman year. 
The mean scores for each group on the 9 subscales 
were computed, as were the SD’s, standard errors of 
the means, and ¢ tests of the significance of the 
differences between the means. The nurses’ means 
were also compared with the means given by Weis- 
gerber for freshman student nurses. All of the means 
and standard deviations for the three groups are 
presented in Table 1. 

A special scoring stencil was prepared for the 65 
items listed as differentiating by Beaver. (Beaver’s 


Table 1 
MMPI Raw Score Means and SD’s 





Drake 
Women 
(V =50) 


Iowa 
Nurses 
(N =81) 


Mean SD Mean SD 


Weisgerber 
Norms 
(N=97) 





Mean SD 





2.87 
4.43 
3.70 
4.07 
4.14 
2.94 
5.63 
5.39 
3.55 


Hs* 12.361 
D 18.227 
Hy 19.247 
Pd* 20.742 
Mf 33.794 
Pa 8.299 
Pi’ 27.031 
ae” 25.443 5.582 

° 20.969 4.263 


3.189 
3.483 
4.296 
3.359 
4.730 
2.508 
4.573 


11.41 
17.98 
19.98 
20.18 
37.27 

9.52 
28.33 
26.11 
19.71 


13.02 
19.30 
21.34 
21.18 
36.50 
10.18 
28.26 
26.54 
19.82 


4:02 
4.65 
4.66 
4.54 
5.66 
3.24 
5.29 
6.28 
4.26 


* On these scales the K correction has been added to the 
raw scores. 
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Table 2 


Comparison of Answers to the 65 Items of the MMPI 





tl F | 
Value 
Nurses Drake of 
sponse p* p De.” 


MMPI 
Item Re- 
Number 


MMPI 
Item Re- 
Number 


Nurses Drake 
sponse p* p 





= 

| 534 T 64 
13 P 

| 345 F 


514 z 

69 T 
435 T 
391 F 
441 T 
144 pi 
283 T 
561 T 

81 z 
203 F 

27 - 
455 T 
378 T 
457 T 
232 = 
548 é 
183 ‘ h 
485 F 
135 F 
456 F 
137 T 

96 T 

32 F 

73 F 
165 F 
163 
370 F 
352 F 
388 F 
166 F 
356 F 
480 F 
351 F 


74 


42 
8S 
64 
.62 
.68 
33 24 
73 58 
ad 32 
64 .82 
93 82 
58 50 
.64 78 
86 70 
.96 .96 





* All nurses’ p's less than .50 indicate the characteristic answer in this study opposite to that given by Beaver. 
** Only ¢ values of 2.00 or higher are indicated. 


article states that a 66-item scale was found, but 
only 65 items are given.) The mean score, SD, and 
standard error of the means for each group were 


Results 


computed, and will be presented later. The 65 items 
were then studied individually. Table 2 presents the 
following data, in order: the number of the MMPI 
question, the answer indicated by Beaver to be 
characteristic of student nurses, the proportion of 
the two experimental groups giving this answer, and 
the t values of the tests of the differences between 
the proportions. 


From an analysis of the data in Table 1 it 
is immediately obvious that the Iowa nurses 
and the Drake women present remarkably 
similar profiles. The only case in which the 
mean scores of the two groups are significantly 
different is on the hypochondriasis scale, 
where the ¢ value is 2.48, significant at the 
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.O5 level of confidence. On the average, 
therefore, the Drake girls seem to demon- 
strate a slightly higher concern about bodily 
functions, but this seems to be the only dif- 
ference. It would seem to be impossible to 
differentiate the nursing student from the 
university student on the basis of the MMPI 
profile. 

The Iowa nurses differ from the sample of 
nurses given by Weisgerber on the three 
scales, the hypochondriasis scale, the interest 
scale, and the paranoia scale. The ¢ values 
of the significance of the differences between 
the means of the two groups are 2.07, 5.23, 
and 2.91, respectively. The first is significant 
at the .05 level of confidence, the latter two at 
the .01 level, but exactly what these differ- 
ences mean is hard to say. They are sta- 
tistically significant differences, but as a prac- 
tical matter they are so closely alike, and so 
completely within any possible “normal” 
range, that they probably are unimportant. 
It would seem possible to state that the two 
samples of student nurses are probably repre- 
sentative of the same population. 

On the 65-item scale taken from the MMPI, 
the mean score of the nurses was 36.85 and 
the SD was 4.76. The comparable scores 
from the Drake women were 34.34 and 4.27. 
The ¢ value of the difference between the 
means is 2.95, significant at the .01 level. 
Again, it is difficult to interpret the meaning 
of this difference. The range of scores for 
the nursing students was from 22 to 48, for 
the Drake students, from 24 to 48. Cer- 
tainly one could not predict which group an 
individual belonged to on the basis of the 
score on this scale. Its practical value there- 
fore is questionable. 

The results of the analysis of the differ- 
ence between the two experimental groups’ 
answers to the individual items, as presented 
in Table 2, are interesting. First, a comment 
must be made about Beaver’s “characteristic 
answer” for nurses. It is presumed that a 
“characteristic answer” is one that is an- 
swered by more than 50% of the sample. 
Yet for 27 of the 65 items studied, over 50% 
of the nurses in this sample respond with an 
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opposite answer to the one indicated. For 
example, Beaver says the characteristic re- 
sponse to item 514 is “True,” yet only 6% 
of this study’s nurses give this answer. 

More interesting are the data on the differ- 
entiating ability of the items. Although 
Beaver reports that each of the 65 items had 
produced ¢ values of at least 2.00, in this 
study only 13 of the items were found to 
differentiate between the two experimental 
groups. For 52 items, therefore, the two 
groups’ response is the same. Strangely 
enough, of the 13 items which differentiate, 4 
of them are the items which are answered 
differently by this sample of nurses. For ex- 
ample, Beaver says the nurses’ characteristic 
answer to question 417 is “True,” that the 
item differentiates the nurses from the other 
college group with a ¢ value of 3.20. In this 
study only 2% of the nurses answer ‘“True”— 
the item is working in the opposite direction 
from the way indicated by Beaver, yet the ¢ 
value is 3.28. Apparently the sample of 
student nurses in the present study respond to 
the MMPI items in a considerably different 
fashion than the sample cited in Beaver’s 
study. 


Discussion and Summary 


The group form of the MMPI was ad- 
ministered to a group of student nurses and 
a group of university women. Both groups 
were composed of freshman students, and 
they were matched for mean age and mean 
ACE raw scores. The data obtained were 
compared with the data presented in recent 
studies by Weisgerber and Beaver. Despite 
some differences, it is concluded that the Iowa 
nurses accord with the norms provided by 
Weisgerber. Although a significant difference 
between the groups in mean scores on the 
scale presented by Beaver is found, it is felt 
that the scale may not be practically useful. 
Only 13 of the 65 items significantly differ- 
entiate between the groups, and a total of 
27 of the items produce answers characteristic 
of nurses opposite to the ones cited in Beaver’s 
article. It is concluded that the scale of 65 
items needs considerably more study and 
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validation before it could possibly be used to 
aid in the selection of student nurses. 


Received July 14, 1954. 
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Henry N. Ricciuti ” 
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This research is concerned with a study of 
the relationship between certain measures of 
the performance of midshipmen at the United 
States Naval Academy and the subsequent 
performance of these individuals after gradua- 
tion as commissioned officers. Principal in- 
terest is centered upon the question of the 
extent to which undergraduate ratings of 
“aptitude-for-service,’ or “leadership poten- 
tial,” are related to the quality of shipboard 
performance of Naval officers during the first 
year after commissioning. 

The present study represents an extension 
of an earlier investigation (1) dealing with an 
analysis of the aptitude-for-service ratings re- 
ceived by members of the class of 1951 while 
undergraduates at the Naval Academy. The 
previous study had led to the conclusion that 
the undergraduate ratings of leadership po- 
tential constituted a satisfactory interim crite- 
rion for the validation of personality tests. 
It was recognized at the time, however, that 
to evaluate the aptitude-for-service ratings 
more fully, their relation to postgraduate of- 
ficer performance would need to be investi- 
gated. The present study was undertaken 
for this purpose when postgraduation crite- 
rion measures became available for the class 
of 1951. 


Procedure 


In carrying out the earlier study (1), a 
variety of measures of undergraduate per- 
formance had been assembled for 621 mem- 
bers of the class of 1951. These measures in- 
cluded ability test scores, aptitude-for-service 


1This study was conducted at the Educational 
Testing Service, Princeton, N. J., while the author 
was a member of the staff of that organization. The 
research was supported by the Bureau of Naval 
Personnel, through Contract Nonr-694(00) with the 
Office of Naval Research. The views expressed in 
this paper are those of the author and do not neces- 
sarily represent the official views of the United 
States Navy. 

* Now at the Child Research Council, University 
of Colorado School of Medicine, Denver, Colorado. 


ratings for several different periods, and class 
standings in academic courses, physical train- 
ing, and conduct (based on number of de- 
merits). In addition, for a subsample of 207 
cases certain biographical data and several 
indices of the individual’s manner of rating 
others had been collected and studied. Briefly 
stated, the general procedure of the present 
investigation involved collecting from Navy 
records all available ratings of officer per- 
formance for the first year after graduation 
on the individuals included in the earlier 
study, and an analysis of the relationships 
between these postgraduation performance 
ratings and the various undergraduate meas- 
ures. The details of this procedure are 
described more fully in the paragraphs which 
follow. 


Subjects 


Large sample. Of the 621 cases included in the 
previous study it was possible to collect postgradua- 
tion performance measures on 403 individuals. (Ap- 
proximately 200 graduates were commissioned in the 
Air Force or Marine Corps, and were not included 
in the present study.) In order to keep the group 
relatively homogeneous with regard to the type of 
job performance being evaluated in the ratings, it 
was decided to include only those performance rat- 
ings covering regular shipboard duty assignments of 
Ensigns commissioned as line officers. This neces- 
sitated the elimination of 68 officers whose ratings 
covered primarily shore duties, Basic Flight Train- 
ing, or Supply Corps assignments. An additional 
11 cases were omitted because of incomplete data, 
thus leaving a total number of 324 officers con- 
stituting the large follow-up sample. 

Small sample. For the same reasons mentioned 
above, the original undergraduate subsample of 207 
cases was reduced to a total number of 98 cases 
available for analysis in the follow-up study. 


Postgraduation Performance Measures 


The measures of postgraduation officer perform- 
ance were taken directly from the official “fitness 
report” forms? which are used in the periodical 
evaluation of the quality of every officer’s job 


3Formally named “Report on the Fitness of 
Officers,’ Form Navpers-310 (rev. 10-51). 
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performance. They are ordinarily filled in and sub- 
mitted by each officer’s immediate superior every 
six months, as well as on every occasion when the 
officer himself, or his superior, is transferred. Three 
of the evaluative judgments contained on the fitness 
report form were utilized in the present investiga- 
tion: 

Performance of duty in present assignment. For 
this judgment, the reporting superior assigns marks 
on a 0.0 to 4.0 scale. 

Desirability. In this case, the judge is presented 
with the following question: “Considering the pos- 
sible requirement of war, indicate your attitude to- 
ward having this officer under your command. 
Would you: (a) particularly desire to have him, 
(b) be pleased to have him, (c) be satisfied to have 
him, or (d) prefer not to have him ?” 

Over-all estimate. For this judgment, the re- 
porting senior is asked to designate the officer as 
(a) outstanding, (b) excellent, (c) above average, 
(d) average, or (e) below average, as compared with 
other officers of his grade and approximate length 
of service. 

For each individual in the study three performance 
measures were obtained by separately averaging, for 
each of the three items mentioned above, all ap- 
propriate ratings available as of October 1, 1952. 
These ratings covered the period beginning with 
graduation in June 1951. The number of ratings 
entering into each if the three composite performance 
measures just mentioned varied from 1 to 4, as 
follows: 1 rating, N=61; 2 ratings, N=170; 3 
ratings, VN = 88; and 4 ratings, N = 5. 

An examination of the “over-all estimate” ratings 
given on different occasions indicated that these 
measures were reasonably stable. For the 170 in- 
dividuals on whom two sets of ratings were avail- 
able, the correlation between the first and second 
rating was found to be .70. In the case of the 88 
individuals having three sets of ratings, the correla- 
tions were .70 between the first and second rating, 
.54 between the second and third rating, and .45 for 
the first and third rating. 


Undergraduate Performance Measures 
(Large Sample) 


Aptitude-for-service ratings. These ratings con- 
stitute an evaluation of the midshipman’s aptitude 
for the military service, or his potential value to the 
Navy as an officer and leader. They consist of 
composite ratings by fellow students as well as by 
superior officers, relative to the student’s performance 
of duty, attitude, bearing and dress, and over-all 
desirability as a potential junior Naval officer. In- 
cluded in the present analysis were Midshipmen 
Composite and Officer Composite ratings for the 
1948, 1949, and 1950 summer training cruises, as 
well as for two academic periods: the second term, 
third class (sophomore) year, and the first term, 
second class (junior) year.4 


*The freshman, sophomore, junior, and senior 
years at the Naval Academy are referred to as the 
4th, 3rd, 2nd, and 1st class years, respectively 
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Class standings in conduct and in courses. Class 
standings in Marine Engineering, History, Foreign- 
Language, and Physical Training for the third class 
academic year (1948-49) were included in the study, 
along with class standings in a Leadership course 
and in Conduct for the second class academic year 
(1949-50). An additional variable included for 
analysis was over-all class standing for the four 
years at the Naval Academy, representing a weighted 
composite of academic grades, conduct, and aptitude- 
for-service ratings. 

Ability test scores. Four of the ability tests in 
the Navy Officer Classification Battery, administered 
early in 1951, were included in the analysis: the 
Verbal, Mechanical, Mathematics, and Relative 
Movement tests. 

The biographical variables and rater characteristics 
which were studied in the smaller subsample will be 
described after the results obtained with the large 
sample have been presented and discussed. 


Analysis and Results 


The analysis of the data was aimed prin- 
cipally at determining the relationships be- 
tween (a) the various indices of undergradu- 
ate performance, as well as the biographical 
data, and (b) the three postgraduation meas- 
ures of shipboard officer performance. Re- 
sults are presented in the sections which fol- 
low, with separate discussions of the large 
sample and the smaller subsample appearing 
in that order. 

Large sample. The correlations obtained 
between the various undergraduate perform- 
ance measures and the three postgraduation 
fitness report ratings are presented in Table 
1.” It will be noted at once that the three 
postgraduation measures reflect very similar 
aspects of officer performance, since they are 
quite highly correlated with one another (r 
= .75, .79, 81). 

In general the fitness report ratings showed 
consistently positive though not particularly 
high correlations with undergraduate aptitude- 
for-service ratings, the values ranging from 
.14 to .33. Further examination of the table 
reveals that, for the most part, undergraduate 
ratings by midshipmen tended to be some- 
what more highly correlated with the post- 
graduation performance measures than were 
ratings made by officers. The generally higher 
correlations found for midshipmen ratings 
are probably due, at least in part, to the 


5 The complete table of intercorrelations among 
the undergraduate performance measures for the 
original large sample can be found elsewhere (1). 





Henry N. Ricciuti 


Table 1 


Product-Moment Correlations Between Postgraduation Performance Measures (Fitness Report Ratings), 
Undergraduate Performance Measures, and Ability Test Scores 


(N = 324.* 


Decimals omitted.) 








Variables 





Postgraduation performance measures 
(fitness report ratings) 


Performance’of duty 
Desirability 
Over-allestimate 
Undergraduate performance measures 
Aptitude-for-service ratings 
Academic 


1st term, 2/c year, Jan. 1950 


2nd term, 3/c year, May 1949 
Cruise 


1/c (senior) cruise, 1950 
2/c (junior) cruise, 1949 


3/c” (soph.)"cruise,*1948 


Class standings 

Marine Engineering 
' His / 

3/c"year, 1948-49 Stony 
Foreign}Language 
Physical Training 


Leadership Course 


Ie 5 
2/c year, 1949-50 ieee 


Over-all standing for four years 


Midshipmen composite 
Officer composite 


Midshipmen composite 
Officer composite 


Midshipmen composite 
Officer composite 


Midshipmen composite 
Officer composite 


Midshipmen composite 
Officer composite 


(Weighted composite of grades, aptitude ratings, and 


conduct) 


Officer Classification Battery Test Scores 
Verbal Reasoning 
Mechanical Comprehension 
Mathematics 
Relative Movement 


20 17 


21 — 04 
22 08 03 
23 —00 04 
24 —01 03 





Note.—For an N of 300, correlations 5.15 and .11 are significant at the 1% and 5% levels, respectively. 
* For variables 21-24, N = 308. 


greater reliability of these ratings, which 
reflect the opinions of a larger number of 
judges than is the case for the officer com- 
posite ratings. At the same time, if the mid- 
shipmen ratings represent somewhat more 
valid estimates of leadership potential, this 


factor might also contribute to the higher 
fitness report correlations obtained for these 
ratings. 

A comparison of the aptitude-for-service 
ratings based upon the summer cruises with 
those made during the academic year in- 
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dicates that the latter tended to yield some- 
what higher correlations with the postgradua- 
tion fitness report measures. This was true 
for both officer and midshipmen ratings. 
Since the postgraduation fitness report meas- 
ures were selected so as to represent ship- 
board performance, one might have expected 
the undergraduate cruise ratings to be better 
predictors of this performance than the aca- 
demic year ratings. Hence, the obtained 
. Superiority of the academic ratings over the 
cruise ratings represents a finding of con- 
siderable interest. 

So far as midshipmen ratings are concerned, 
the above finding may be accounted for in 
part by the fact that (@) the cruise ratings 
were based on fewer judges than the ratings 
made during the academic year, and (0) the 
cruise ratings (for 1948 and 1950) included 
judgments made by NROTC midshipmen 
from civilian colleges. These two factors may 
have had the effect of reducing the reliability 
(in the sense of interjudge agreement) of the 
cruise ratings when compared with the ratings 
for the academic periods. At the same time, 
they probably contributed to the previously 
reported finding (1) that the cruise ratings 
were less stable from one marking period to 
another than were the academic year ratings. 
In the case of both midshipmen and officer 
ratings, the lower fitness report correlations 
yielded by summer cruise ratings might be 
partly due to the fact that these estimates 
were based on a considerably shorter period 
of observation. 

There is a further factor which might be 
mentioned in attempting to account for the 
higher fitness report correlations obtained for 
ratings based on the academic periods. So 
far as immediately apparent job character- 
istics are concerned, it seems reasonable to 
assume that the postgraduation shipboard job 
assignments have more in common with the 
tasks a midshipman is assigned to do on the 
summer cruises than with those he faces 
during the academic year. Nevertheless, it 
may be that the activities of the academic 
periods at the Naval Academy involve cer- 
tain less obvious features (such as particular 
sorts of interpersonal demands, etc.) which 
bring out more of the qualities of behavior 
or attitude predictive of later shipboard of- 
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ficer performance than is the case with the 
apparently more relevant summer cruise ac- 
tivities. 

Another point of interest in Table 1 is the 
relatively high correlations shown by the 
1948 midshipmen cruise ratings, which were 
furthest removed in time from the postgradua- 
tion criteria. Since the 1948 cruise ratings 
consisted principally of ratings made by upper 
classmen, while the midshipmen ratings for 
the other two cruises were made primarily by 
fellow classmates, these findings might be 
partly explained in terms of the hypothesis 
that ratings by upper classmen constitute 
somewhat better estimates of leadership po- 
tential than do ratings by classmates. 

An examination of the relationships be- 
tween the other variables contained in Table 
1 and postgraduation fitness report ratings 
reveals small positive correlations for class 
standings in Marine Engineering (r’s from .11 
to .18), Physical Training (.12 to .18), and 
Conduct (.10 to .15). Over-all standing for 
the four years at the Naval Academy, based 
upon a weighted composite of grades, apti- 
tude-for-service ratings and conduct, yielded 
correlations of .22, .17, and .26 with the 
three criteria of officer performance during 
the first year after graduation. No relation- 
ship was found between postgraduation per- 
formance ratings and the four ability test 
scores included in the analysis. 

It would seem desirable to relate briefly 
the general findings described above to the 
results reported in two similar follow-up 
studies of U. S. Military Academy graduates 
(4, 5). The two studies just mentioned 
yielded results agreeing generally with those 
of the present investigation, in regard to the 
relative superiority of aptitude-for-service rat- 
ings over all other measures of undergraduate 
success as predictors of postgraduation officer 
performance. Although the relative magni- 
tudes of the various correlations reported in 
the West Point studies are quite similar to 
those reported here, the absolute values of the 
correlations between officer performance and 
undergraduate aptitude-for-service ratings are 
generally higher in the Army studies. One 
factor which might contribute to these higher 
correlations is the possibly greater homo- 
geneity of the Army officer groups studied, 
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particularly the Infantry groups, with re- 
spect to the type of job performance being 
evaluated. 

Results very similar to those obtained in 
the present study have been reported in an 
investigation of the relationships between 
“buddy ratings’ obtained during Air Force 
officer candidate training and measures of 
officer effectiveness following graduation, the 
reported correlation being .26 (2). 

Small sample. As previously mentioned, 
several indices reflecting the midshipman’s 
manner of rating others at the Naval Acad- 
emy were available for a subsample of 98 in- 
dividuals. These rater characteristics were: 
(a) the mean rating he assigned to others, 
(6) the standard deviation of these ratings, 
(c) the degree of agreement (correlation) be- 
tween his particular ratings of his associates 
and the composite rating of these same indi- 
viduals, and (d) the extent to which the rater 
attempted to differentiate among the four sub- 
categories of aptitude-for-service. The last 


mentioned measure was obtained by averag- 
ing, over all men rated, the difference between 
the highest and lowest rating on four different 
variables assigned to each man by the rater. 


When the aforementioned rater character- 
istics were correlated with the postgraduation 
fitness report ratings, the only significant re- 
lationships found were small negative correla- 
tions (— .20) between two of the fitness re- 
port measures and the standard deviation of 
the ratings assigned by the rater. Thus, 
there appeared to be a slight tendency for 
the individual who differentiated more widely 
among his associates in assigning them leader- 
ship ratings at the Academy to receive low 
officer fitness report ratings during the first 
year after graduation. A similar, although 
less marked correlational trend (7’s from 
— .07 to — .09) had been observed between 
corresponding variables when the individuals 
were still undergraduates (1). 

Several biographical or background charac- 
teristics of interest were also included among 
the variables studied in the small sample. 
These were: (a) age; (0) type of pre-Academy 
education, i.e., regular high school, prepara- 
tory school, or college training on the One 
hand, versus schooling in special pre-An- 
napolis preparatory schools, NROTC, V-12, 
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or other Naval training programs; (c) num- 
ber of months pre-Academy military service; 
(d) number of hospital or sick-quarters ad- 
missions; (e€) number of elective extracur- 
ricular positions held and (f) number of 
sports awards received during the first three 
years at the Naval Academy; (g) type of ap- 
pointment to the Academy, i.e., Congressional 
appointment, in which case the element of 
competitive examinations is relatively not 
very great, versus purely competitive appoint- 
ment; and (/) father’s occupation, whether 
civilian or member of the Armed Forces. 

An examination of the correlations obtained 
between the above variables and one of the 
postgraduation fitness report measures (“‘per- 
formance of duty’’) revealed only one signifi- 
cant relationship. This was a correlation of 
.28 between the officer fitness report rating 
and age, indicating a slight tendency for the 
somewhat older officers to be rated more 
highly. The age range of the group studied 
as of October 1952 when the fitness report 
data were collected extended from twenty- 
two and one-half years to twenty-eight years. 
It is interesting to note that virtually no re- 
lationship had been found in the earlier study 
(1) between age and undergraduate ratings 
of aptitude-for-service at the Naval Academy. 
These comparative findings would seem to 
indicate, then, that certain behavioral charac- 
teristics associated with age tend to be re- 
flected favorably in the fitness report ratings 
of recently graduated officers, but they are 
not reflected in ratings of aptitude-for-service 
at the Naval Academy. 


Discussion 


The results of the present study indicate a 
definite positive relationship between a mid- 
shipman’s standing on aptitude-for-service at 
the Naval Academy and his over-all fitness as 
an officer during the first year after gradua- 
tion. Although this relationship is not a 
particularly strong one, it is of sufficient 
magnitude to warrant the conclusion that un- 
dergraduate aptitude-for-service ratings rep- 
resent an evaluation of important correlates 
of future success as a Naval Officer. 

Since the view is often expressed that péer 
ratings involve “popularity” or “pleasant sur- 
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face personality” to a considerable degree, it 
should be mentioned briefly that one possible 
interpretation of the positive relationships 
found in this study between undergraduate 
and postgraduation ratings might be that 
both sets of ratings tend to “pick up” similar 
favorable aspects of the individual’s surface 
personality. The extent to which this might 
be the case and whether or not it constitutes 
a problem would be a matter for further con- 
sideration and study. 

It should be borne in mind that the present 
investigation dealt with officer performance 
for a relatively short period of time at the 
very beginning of the young commissioned 
officer's career. Before the long-range pre- 
dictive value of aptitude-for-service ratings 
and of the other Academy performance meas- 
ures can be determined, of course, additional 
research will need to be undertaken. Con- 
tinued longitudinal studies would permit a 
more thorough evaluation than was possible 
in this preliminary research of the relative 
importance of Academy aptitude-for-service 
ratings, academic courses, etc. in the predic- 
tion of later officer success in various types 
of duty assignments.° The results of such 
longitudinal studies could then be utilized in 
considering the appropriateness of the rela- 
tive weights given to various undergraduate 
performance measures in obtaining the mid- 
shipman’s final composite standing for the 
four-year course of training at the Naval 
Academy. 


Summary 


The present study was concerned with an 
investigation of the relationships between cer- 
tain academic and nonacademic measures of 
undergraduate performance at the United 
States Naval Academy, and subsequent offi- 
cer fitness report ratings covering approxi- 
mately the first year after graduation. For 
324 graduates of the class of 1951, fitness re- 
port ratings based primarily on shipboard 
performance were found to yield consist- 
ently positive, but rather modest correlations 


6 For example, additional evidence of the predic- 
tive value of West Point aptitude-for-service ratings 
in regard to combat effectiveness has recently been 
reported (6). 
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with undergraduate aptitude-for-service rat- 
ings. These correlations were generally some- 
what higher (.22 to .33) for aptitude-for- 
service ratings made during the academic 
year than for ratings based on summer train- 
ing cruises (.14 to .33). In general, under- 
graduate ratings by midshipmen were found 
to yield higher correlations with postgradua- 
tion performance measures than did ratings 
by officers. The other undergraduate per- 
formance measures studied were found to bear 
a considerably lower relation to postgradua- 
tion ratings, with only grades in Marine En- 
gineering, Physical Training, and Conduct 
showing some slight positive correlations (i.e., 
from .10 to .18). Of several biographical 
and background characteristics which were 
studied, only age showed a significant rela- 
tionship (.28) to postgraduation officer per- 
formance measures. It is concluded that the 
undergraduate aptitude-for-service ratings re- 
flect significant correlates of successful officer 
performance following graduation. 


Received May 6, 1954. 
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Humphreys, J. Anthony, and Traxler, Arthur 
E. Guidance services. Chicago: Science 
Research Associates, 1954. Pp. 438. $4.75. 
Thaumaturgical are the anticipations of 

guidance personnel. So fervently have they 

hoped that indeed the miracle has come to 
pass. 

Brought together here in a single volume 
are the essential threads of guidance philoso- 
phy, history, structure, and procedures so that 
a textbook now truly exists for the begin- 
ning course in guidance. Yet so thoroughly 
is the purpose accomplished that it may serve 
equally well as a text for the final course in 
guidance. 

Under the editorial craft of Clifford Froeh- 
lich, this book becomes number four in the 
SRA Professional Guidance Series, and in 
common with its predecessors, is character- 
ized by logical arrangement, concise phrasing, 
and extreme readability. The most impres- 
sive feature of Guidance Services, however, is 
the directness with which the authors have 
met the needs of those beginning a formal 
study of guidance. 

Following the lead of their editor, who in 
his own writings has defined guidance into a 
structure of services, Humphreys and Traxler 
have presented a point of view which ranges 
from the beginnings of guidance to a projected 
look at the future of guidance services, includ- 
ing straightforward attention to the sociologi- 
cal and psychological bases upon which guid- 
ance 1954 operates. 

Specific directions are given for collecting 
and recording data, for counseling and inter- 
viewing, for group techniques, for follow-up 
procedures, and for evaluation processes. Yet 
these directions are so given that the reader 
is stimulated into ideas of his own, based on 
the broad principles presented also. 

Particularly illustrative of the directness of 
these authors: are the chapters devoted to 
helping studerits solve their problems—edu- 
cational, vocational, and personal. So well 
written are these that the reader gets an over- 
view of how to go about helping students 
without becoming entangled in the underbrush 
of conflicting theories. This volume escapes 


the criticism deserved by many others in the 
field—tthat of deriving procedures from a par- 
ticular school of thought even though that be 
eclectic in nature. The emphasis here is 
upon what the student needs to do and what 
function the counselor serves. These par- 
ticular chapters might well be required read- 
ing for more experienced counselors. 

The writers insist most persuasively that 
schools have definite responsibility for the 
proper placement of students and devote a 
chapter to this oft-neglected (or postponed) 
guidance service. 

Although relatively short, the section on the 
administration of guidance services probably 
is more helpful than several books written on 
that single topic. 

In this reviewer’s opinion, Guidance Services 
is a tribute to its field and is a remarkable ex- 
ample of the imminent maturity of guidance 
programs. 

One aspect of the book should be men- 
tioned, however. By limiting the scope to 
school practices above the elementary school, 
a more comprehensive picture is given of sec- 
ondary and postsecondary guidance, but read- 
ers anxious to be as well informed on the ele- 
mentary level will be disappointed. 

Perhaps it is in order for elementary guid- 
ance personnel to anticipate something equally 
thaumaturgic on the services which precede 
those described by Humphreys and Traxler 
in what can best be summarized as one of the 
best books on guidance yet written. 


Laurence L. Belanger 


California State Department of Education 


Klare, George R., and Buck, Byron. Know 
your reader: the scientific approach to 
readability. New York: Hermitage House, 
1954. Pp. 192. $2.95. 

This book is an important contribution to 
the growing literature on problems of com- 
munication. The three-word title shifts at- 
tention from the writer to the reader. And 
the five-word subtitle tells the communicator 
that this book will force him to consider the 
arts of rhetoric and composition and to 
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plunge into the domain of the science of 
readability. It is fortunate, therefore, that 
the co-authors should come from these two 
disciplines. Buck is an editor at Macmillan 
and Klare is a psychologist who has special- 
ized in the field of readability measurement. 
The book is thus an excellent example of a 
codisciplinary attack on a common problem. 

Space limitations preclude an adequate re- 
view of this 17-chapter book. After a brief 
introduction, the first chapter outlines the 
general and specific objectives of the authors. 
The general objective is to provide writers 
with information that will help them to com- 
municate more effectively. Chapter 2 out- 
lines the wide range of applications of read- 
ability formulas together with evidence of va- 
lidity. Chapters 3, 4, and 5 deal with the 
history of readability research. Chapters 6 
and 7 discuss the reader himself. The next 
three chapters (8, 9, 10) give the technical 
background for understanding the 39 formu- 
las now in existence. Chapters 11, 12, and 
13 represent a sort of detour concerned with 
the preparation for writing, some of the as- 
pects of style that make for improved read- 


ability, and typographical arrangements that 


promote readability. Finally, Chapters 14, 
15, and 16 deal again with the role of read- 
ability formulas, their limitations and their 
positive contributions to the art of readable 
writing. 

This book is a treatise on all phases of read- 
ability. The presentation is historically ori- 
ented and generous credit is given to all who 
have contributed to the science of readability. 
The authors are to be commended, especially, 
for their judicious selection of materials, their 
fairness in treating each topic, and their wis- 
dom in stressing limitations and pitfalls as 
well as the positive aspects of readability 
formulas. 

Since this book is the sixth volume in the 
Professional Writers Library under the gen- 
eral editorship of Gorham Munson, this re- 
viewer was curious as to the evaluation a 
famous literary editor, consultant, and the 
author of Fundamentals of Fiction Writing 
would give to this attempt. The expert opin- 
ion of Arthur Sullivant Hoffman was sought, 
therefore, and the following paragraphs from 
his pen are quoted by permission: 
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The lay readability of Know Your Reader is in 
general so excellent that criticism is best concerned 
with what the book lacks of being even better. 

More space might have been devoted to a fuller 
consideration of what the readability formulas can 
undoubtedly do for professional writers. There 
would be gain in greater allowance for the infinite 
differences among both writers and readers, for ex- 
ample, in responses of the imagination to word 
stimuli. The line between fiction and non-fiction 
might well have been drawn more sharply. 

But if the present book has flaws, it is none the 
less remarkable for the many pitfalls it has avoided, 
for its lucidity and for its practical value to writers. 


In conclusion, the serious student of com- 
munication, whether he be a writer or a psy- 
chologist, will welcome this book because it 
brings together the facts and principles con- 
tained in the comprehensive 193-item bibli- 
ography. Thus, the art and science of read- 
ability has now emerged as a maturing 
discipline, and Klare and Buck deserve an 
accolade for their competence in bringing it 
to fruition. 

Donald G. Paterson 


University of Minnesota 


Mann, Floyd, and Baumgartel, Howard. Ab- 
sences and employee attitudes in an elec- 
tric power company. Ann Arbor: Institute 
for Social Research, Univer. of Michigan, 
1952. (Human Relations Program, Series 
1, Report 2.) Pp. 24. 

Mann, Floyd, and Baumgartel, Howard. The 
supervisor’s concern with costs in an elec- 
tric power company. Ann Arbor: Institute 
for Social Research, Univer. of Michigan, 
1953. (Human Relations Program, Series 
1, Report 3.) Pp. 28. 

Mann, Floyd, and Dent, James. Appraisals 
of supervisors and attitudes of their em- 
plovees in an electric power company. Ann 
Arbor: Institute for Social Research, Uni- 
ver. of Michigan, 1954. (Human Rela- 
tions Program, Series 1, Report 4.) Pp. 39. 


These booklets were written for business 
executives, not for psychologists. They illus- 
trate how reports should mot be written. 
They badly miss their mark. 

A style and format popularly considered to 
be “advertising technique” is used, and the 
setup is reminiscent of grade-school level al- 
though the booklets are actually difficult to 
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read. The type style intermixes italics, 
Gothic, and boldface type. Some lines start 
on the left side of the page, others on the 
right, and others in the middle. Single sen- 
tences are badly broken; for example, one 64- 
word sentence is set up in six sections on 
eight lines. The setup makes the sentence 
even more difficult to grasp than most 64- 
word sentences. 

The booklets are filled with bar diagrams. 
These do not always clarify or add to con- 
text material. In some cases a bar chart ap- 
pears between two lines of a single sentence 
which comprises an entire paragraph. 

The writing suffers from what this reviewer 
interprets as a misguided attempt to be mod- 
ern or popular. The authors go beyond evi- 
dence on hand in inferring causal relation- 
ships. Clumsy structure sometimes results in 
sentences which are meaningless if inter- 
preted literally. Some statements are vague 
or ambiguous. Carelessness in word choice 
has resulted in incorrect statements. 

The style of writing and printing is un- 
fortunate because the authors do have infor- 
mation which can and should be presented to 
top management as well as to psychologists. 


Executives are not professionally trained psy- 


chologists. However, reports to top manage- 
ment need not assume lack of intelligence, 
lack of general information, or lack of inter- 
est. Many executives are studious, and they 
do have common sense. They need not be 
spoon-fed, and they do not demand window 
dressing. 


C. E. Jurgensen 
Minneapolis Gas Company 


Hamrin, S. A. Initiating and administering 
guidance services. Bloomington, Ill.: Mc- 
Knight & McKnight Publishing Company, 
1953. Pp. 220. $3.00. 

Hamrin addressed this book to school prin- 
cipals and superintendents because “no worth- 
while program of guidance services can thrive 
or even exist long without sympathetic under- 
standing and encouragement of school ad- 
ministrators. It is hoped that this volume 
will aid in promoting such understanding and 
assistance.” To provide a framework for the 
description of guidance services, Hamrin fol- 
lowed tradition by preparing a chapter on 
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each major guidance service, such as pre- 
admission and orientation services, counseling 
services, etc., in addition to three introduc- 
tory chapters establishing the roles of the 
principal, teacher, and counselor in the guid- 
ance program. The result is an inclusive but 
superficial discussion of guidance services. 
The book might better have been titled, “A 
Guidance Primer for School Administrators.” 
If the reader wants such a book, he will find 
this one as good as others currently available 
for getting a hazy overview of guidance pro- 
grams. But if he wants a basic, thorough, 
and scholarly consideration of the fundamen- 
tal issues and practices in initiating and ad- 
ministering a guidance program, he will be 
disappointed. This reviewer does not know 
of such a book, yet he would like to have one 
which could be used as a text in his course, 
“Organization and Administration of Guid- 
ance Services.” Hamrin’s title built up the 
reviewer's hopes that he might omit his usual 
lecture comments about the need for an ade- 
quate text. He intends to continue his com- 
ments and use the same old text. Hamrin’s 
book, however, will be placed on the course 
bibliography. 

This book appears to be a combination of 
Hamrin’s lectures illustrated by excerpts from 
term papers written by Hamrin’s students. 
He implies as much in the foreword where he 
lists the names of 100 students. This re- 
viewer got the feeling he was reading a guid- 
ance cookbook with many untried recipes. 
Some of the illustrations (presumably sup- 
plied by students) are written in the future 
tense and are hypothetical to a fault. One 
homeroom teacher, for example, is quoted as 
planning a personal interview with each of 
the 30 students in his homeroom each se- 
mester. In writing of his plans for the inter- 
views he said, “I must direct him toward a 
feasible plan for the future. But I must re- 
member that it is my duty to help the stu- 
dent solve his own problems, not to solve 
them for him. . . . Whenever possible, or 
when it seems advisable, I shall have confer- 
ences with both the parents and the student, 
either during the home visit or at school.” 
Frankly, this reviewer was annoyed by the 
interspersion of the amateurish illustrations 
in the midst of Hamrin’s expert and mature 
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discussions of the various topics. Hamrin 
has demonstrated in his previous books that 
his own experience has been rich and suffi- 
ciently varied to provide pertinent illustrative 
material. 

The best chapter in the book is entitled 
“Promoting Personal and Social Growth 
Through Group Activities.” In it Hamrin 
demonstrates again his rare ability to point 
out the implications of desirable theoretical 
points of view in such a way that the reader 
can see how to make use of them in an on- 
going public school program. Most readers 
will say it is a “practical chapter.” There 
are no quoted illustrations; the chapter is all 
Hamrin. The whole book would have been 
better if it were all Hamrin, too. 


Clifford P. Froehlich 


University of California, Berkeley 


Arsenian, Seth, and McKenzie, Francis W. 
Counseling in the YMCA. New York: As- 
sociation Press, 1954. Pp. 126. $2.00. 
In the foreword, Joseph V. Hanna states 

that this book “. . . endeavors to provide a 

sound basic orientation for responsible gen- 

eral and professional counseling and to relate 
principles and procedures to the administra- 

tive framework of the YMCA branch” (p. 8). 

Hanna describes it as neither a textbook nor 

a handbook but as a sourcebook written par- 

ticularly for the YMCA secretary who is con- 

cerned with a social need. 

Within this modest framework, the au- 
thors have done a thoroughly creditable job. 
To some extent, however, the book’s very 
strengths are also perhaps its most serious 
weakness. Many YMCA secretaries will read 
this book and get from it the broad overview 
of counseling which the authors set out to 
provide. It is also possible that a few zeal- 
ots will feel, after reading it, that they are 
equipped to set up and operate a counseling 
agency. This, of course, is a danger inherent 
in any book dealing with counseling or other 
phases of psychology. 

Two chapters of the book deal with coun- 
seling itself; others are concerned with his- 
torical perspectives and assumptions, a theory 
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of personality, levels of counseling in the 
YMCA, and the organization of counseling 
programs. Of these, the necessarily elemen- 
tary chapter on personality is the weakest. 
Admittedly a difficult task to do well, this 
chapter might well have been omitted with- 
out seriously affecting the book. Five ap- 
pendices list suggested readings, articles, 
sources of occupational information, publish- 
ers of tests, and names of YMCA counseling 
agencies approved by the American Person- 
nel and Guidance Association. 

Although the authors are very careful to 
point out that there are several levels of 
counseling, some of which can be done by the 
average YMCA worker, others of which are 
the business of professionally trained indi- 
viduals, there runs through the book a 
thread typified by the following: “Every 
YMCA secretary does counseling, whether he 
calls it by that name or not” (p. 23). In 
view of a considerable amount of recent lit- 
erature on the nature of counseling, this is an 
oversimplification which leads the reviewer to 
reiterate the criticism mentioned above that 
some zealots may misinterpret the book’s 
mission and take it as a sufficient training for 
all levels of counseling. | 

Systematically, the authors draw on many 
“philosophies,” but the debt to Rogers is 
heavy and apparent. There are occasional, 
jarring statements such as the following: 
“If you can accept the client as a person 
and treat him as such, you need not worry 
about rapport” (p. 53). Later, while indi- 
cating that counseling should reduce anxiety, 
they neglect to point out that in many cases 
anxiety increases in the early stages of coun- 
seling, a fact which may come as something 
of a shock to neophytes. 

By and large, however, this book should 
serve a very useful purpose. It is hoped that 
readers will learn the important lessons 
taught and at the same time avoid the trap 
of assuming that this book equips them to do 
counseling. If it arouses curiosity and inter- 
est and sensitizes its readers to the problems 
involved, it will have served its purpose. 


John W. Gustad 


University of Maryland 
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Weitzenhofer, A. M. Hypnotism: an objec- 
tive study in suggestibility. New York: 
John Wiley & Sons, 1953. Pp. xvi + 380. 
$6.00. 


It has been said that there is a resurgence 
of interest in hypnosis with a periodicity of 
about 30 years. This book among many 
others is partly the result of the present climb 
toward the crest of the wave. However, if 
more books of its nature were written, it is a 
fair assumption that scientific interest in hyp- 
nosis would remain at or near its peak. 

Possibly no author, including Hull, has sur- 
veyed the varied experiments and the ofttimes 
vehement claims and counterclaims in the lit- 
erature on hypnosis with the degree of clear- 
sighted scientific analysis attained by Weitz- 
enhofer. 

It has long seemed strange to this reviewer 
that psychologists have paid so little atten- 
tion to hypnosis because it would seem that 
an intensive study of this phenomenon should 
throw considerable light upon the very foun- 
dations of human nature. In no other aspect 


of life can complicated behavior be predicted 
with such a high degree of accuracy. 


No doubt the unfortunate history of hyp- 
nosis, its association with public entertain- 
ment and the feeling that the whole thing is 
weird, have contributed to building up inhibi- 
tion against its scientific study. 

Many of these inhibitions should disappear 
if the point made by Weitzenhofer were gen- 
erally entertained, namely, that many of the 
hypnotic phenomena are of common every- 
day occurrence and that “In nearly all in- 
stances, the existing differences between hyp- 
notic and waking phenomena are found to be 
more of degree than of quality” (p. 277). 
Along the same line he makes the point that 
there is “reason to believe that no hypnotized 
person has ever significantly surpassed in a 
given function the best performance obtain- 
able from non-hypnotized individuals taken 
from a sufficiently large random population” 
(p. 286). The reviewer would agree and also 
state his belief that no hypnotized individual 
ever surpasses his own best nonhypnotized 
performance if the latter is obtained under 
optimum conditions. 

After reviewing the experimental literature 
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Weitzenhofer surveys critically the theories of 
hypnosis. He rejects all of these theories 
either in whole or part because of inade- 
quacies and then formulates one of his own. 
To the reviewer none of these theories, in- 
cluding that of the author, is very enlighten- 
ing. They all lack predictive power, which 
in the reviewer’s opinion is one of the prin- 
cipal functions of a scientific theory. The 
author says that his theory has predictive 
value but it was not evident to this reader. 

It is quite possible that the endeavor to 
formulate a theory of hypnosis is at the mo- 
ment futile. The point of the author that 
“, . there would seem to be indications that 
hypnosis is a unique property of language, of 
man’s ability to use symbolic processes, and 
of their actual existence in man” (p. 252) is 
well taken. But what do we know psycho- 
logically of language and symbolic processes? 
Probably not enough as yet to make theoriz- 
ing about hypnosis very profitable. Hypno- 
sis also seems to involve ideomotor action, 
imagination, and attitudes, to say nothing 
of obscure personality traits. Our present 
paucity of knowledge of these complicated 
factors may make premature the attempts to 
develop theories of hypnosis. Nevertheless, 
the study of hypnosis may well add to our 
knowledge of these facets of human nature 
and thus all can proceed abreast until it may 
come about that there will need be no theory 
specifically confined to hypnosis. 

In view of the great value of this book the 
reviewer feels a bit picayunish in mentioning 
minor faults. However, he will bow to cus- 
tom and say that he feels that in portions of 
the book the sentence structure seems unduly 
involved and cumbersome. Also, several dia- 
grams are given which are intended to clarify 
but which fail noticeably in that mission, and 
the method of indicating footnotes seems odd 
and even a little irritating. 

Having discharged that unpleasant for- 
mality, the reviewer would like to close by 
indicating his belief that this book will be 
profitable reading for all who have even the 
slightest scientific interest in hypnosis, and 
for others it may generate interest if they can 
but find time to study it. 


William T. Heron 


University of Minnesota 
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Jennings, E. E. Improving supervisory be- 
havior. Madison: School of Commerce, 
Bureau of Business Research and Services, 
University of Wisconsin, 1954. (Wis- 
consin Commerce Studies, Vol. II, No. 1, 
January, 1954.) Pp. 35. $1.15. 

Most of this brief report is devoted to the 
description of the problems faced and tech- 
niques used in a conference-type foreman 
training program. With few exceptions, 
neither the problems nor the techniques are 
particularly new. 

A method of evaluating the training is re- 
ported. Employees working under the 40 
foremen trained rated their foremen before 
training using a listing of 23 desirable char- 
acteristics of foremen. At the end of train- 


ing, employees were asked to again rate their 
No change was 


foremen on the same list. 
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found in these ratings for the total group 
after training, except that the ratings for the 
lower half of the foremen appeared to change 
somewhat. However, no significance test is 
reported to determine whether or not the 
change is any greater than might be expected 
by chance error, nor is the possible effect of 
regression pointed out. 

In the reviewer’s opinion, this report will 
not make any substantial contribution to the 
literature available to the increasing number 
of training directors who are responsible for 
supervisory training, and who are looking for 
effective ways and means of evaluating their 
efforts. 

Theodore R. Lindbom 


Midland Cooperatives, Inc., 
Minneapolis, Minnesota 
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